Saturday, 25 July 2015

Cool about concurrency

In this short article I will show how trying to keep state consistency in the face of concurrent access may be detrimental while allowing a little bit of laxity on the matter may be beneficial, with no adverse effects.

The problem

Apache Commons has a nice little class for lazy initialisation aptly named LazyInitializer. We can clothe it in Java 8 garments in the following simple manner:


/**
* Type of {@link LazyInitializer} that gets a {@link Supplier} to initialise
* with.
* @param <T> type of lazily enumerated entity.
*/

public final class Lazy<T> extends LazyInitializer<T> {


   
private Supplier<T> supplier;

   
/**
     * Constructs a {@link Lazy} instance.
     * @param supplier {@link Supplier} instance providing the lazily
     * initialised element.
     */

   
public Lazy(Supplier<T> supplier) {

       
this.supplier = supplier;

   
}


   
@Override

   
public T get() {

       
try {

           
return super.get();

       
} catch(ConcurrentException ex) {

           
throw new UnsupportedOperationException(ex.getCause());

       
}

   
}

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

}

There may be scenarios where the function behind supplier may be very costly so we might want a way to know in advance whether the object has been initialized or not. If it’s been initialized, we call Lazy<T>.get() because it is cheap. If it hasn’t been initialized, we give up thus avoiding the expensive call.

Alas, LazyInitializer<T> doesn’t help much because all it has is the get() method. This means the extra work must go inside of Lazy<T>. This means we need a method isInitialized():


/**
* Gets whether this lazy instance has been initialised.
* @return {@code True} if the value has been initialised, {@code false}
* otherwise.
*/

public boolean isInitialized() {

   
/* ... code here ... */

}

Solution 1 (bad): lock on ‘this’

The source code of LazyInitializer<T> shows that it uses a two-step lock on this to ensure state consistency on concurrency so we may be tempted to use a lock on this as well:


   
private boolean initialized;
   
.... code here ....

   
public boolean isInitialized() {

      
synchronized(this) {

           
return initialized;

       
}

   
}

   
.... code here ....

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
initialized = true;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

This solution is inefficient simply because it induces context switches more than necessary.

Solution 2 (better): use AtomicBoolean

A solution slightly better is to declare initialized as an AtomicBoolean which is more efficient than locking this:


   
private AtomicBoolean initialized;
   
.... code here ....

   
public boolean isInitialized() {

       
return initialized.get();

   
}

   
.... code here ....
 
   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
initialized.set(true);

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

But the best solution is not to use synchronization at all …

Solution 3 (best): use nothing

The best solution is simply to declare initialized as a volatile boolean and leave things to the mercy of the JVM:


  
private volatile boolean initialized;
   
.... code here ....

   
public boolean isInitialized() {

       
return initialized;

   
}

   
.... code here ....

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
initialized = true;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

This is the most efficient solution because it induces no context switch, not spin lock wait and it doesn’t get memory for an extra embedded object.

Why it works

In absence of concurrency, the code obviously works. Let’s think what may go “wrong” when concurrency is involved:

  1. some  thread calls Lazy<T>.isInitialized()
  2. some other thread calls Lazy<T>.get()
    1. this in turn calls Lazy<T>.initialize()
      1. this in turn sets initialized = true
  3. meanwhile the first thread loads the value of initialized to return it as a boolean. The value may be before or after initialized = true has completed, depending on how fast supplier is.
  4. this means that the first thread may get a response of false from isInitialized() even though a value may have been fetched just around that time.

In other words, the result may be reversed during an extremely tiny window, depending on the outcome of a small race condition that may take place only once, only if concurrent calls take place when the lazy value gets initialized the first time.

The question is: does it matter that the result may get reversed during this tiny time window?

The answer is absolutely not, at least not for any decent concurrent code. As for multi-threaded code which relies for its decisions on the outcome of a race between threads, such an anomaly is fundamentally flawed and must be avoided in all forms or shapes.

Why (apparently) it doesn’t work and how we (don’t) fix it

So easy to overlook, but there may be another tiny time window in which the things may seem to go wrong:

  1. some thread calls Lazy<T>.get() which in turn calls Lazy<T>.initialize() which in turn sets initialized = true
  2. some other thread calls Lazy<T>.isInitialized() and harvests initialized, after the first threat set it to true but before the LazyInitializer<T>.get() had a chance to finish
  3. also, before the first thread has a chance to finish, the second thread calls Lazy<T>.get() expecting a non-null value which may lead to bad things such as NullPointerException

Fortunately, the second thread has no chance of getting a null value simply because Lazy<T>.get() locks on this. This means that a second get() attempt will be stuck until the first get() completes, thus aligning initialized and the lazy value to the same consistent state. The alignment happens always as it is guaranteed by the very nature of the solution.

In other words, by exploiting the nature of the problem, we introduce an important flag which is not guarded at all but which ends up consistent simply because the very solution to the problem never lets it susceptible to damaging inconsistencies.

Isn’t concurrent programming fun?

The simplest of all: make the alignment explicit

Because Java methods are virtual by default, the simplest solution is to simply move initialized = true inside of Lazy<T>.get() thus making the alignment explicit:


   
.... code ...
   
@Override

   
public T get() {

       
try {

           
final T result = super.get();

           
initialized = true;

           
return result;

       
} catch(ConcurrentException ex) {

           
throw new UnsupportedOperationException(ex.getCause());

       
}

   
}

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

} // class Lazy<T>

Conclusion

State consistency in face of concurrency is a difficult problem that, when not thought of carefully, may hurt either performance, or quality or both. Yet, sometimes relaxing the dogmatic requirements of concurrent state consistency not only simplifies the problem but also improves performance with no undesired side effects.

No comments:

Post a Comment