Friday 31 July 2015

Randomization is good

In this article I will show how I used random testing to check with a high level of confidence that the behaviour of EnumJ enumerators adhere to the expected behaviour as embodied by Java 8 streams (which enumerators emulate).

About this important step towards releasing EnumJ 1.0 I wrote in a previous post. I’ve just finished this step yesterday. But before jumping into details, let’s see a little bit of background.

The problem

There are many ways to look at testing, its ways and its goals. In this case I look at it in the following way:

  • an enumerator (or stream) is an entity with internal state
  • applying an operator on it alters its state
  • the outcome of the entity is a collection determined by the state that the entity has upon enumeration

So, we are testing two state machines: the one we trust (Java 8 streams) versus the one we check (EnumJ enumerators) and we expect the latter to follow the former. The test procedure is obvious:

  • put both entities (enumerator and stream) in the same initial state
  • apply on both the same operations – which supposedly will induce the same state transitions
  • extract the outcomes of both and compare. If the outcomes match then the final state of the enumerator is the correct one and from this we can infer that all the state transitions in between are correct, too

The bigger problem

As I mentioned earlier, Java 8 streams don’t scale well under compositions. They go up to between 1000 and 10000 compositions (depending on composition), beyond 10000 the risk of streams overflowing the stack is high. For this test we assume up to 1024 compositions.

Now let’s evaluate the size of the state space. We use 9 operations which yield the same type of object and are common to both streams and enumerators: distinct, filter, limit, map, peek, skip, sorted as well as concat and flatMap. This gives us 91024 possible final states and if we consider all the lengths from 0 to 1024 then it’s (91025-1)/8. If we take into account that the last two operations also make use of sub-streams (or sub-enumerators) which have their own sub-state spaces, then the total number of states grows beyond imagination.

How can we ensure that we traverse any significant sub-part of this enormous space which is perhaps larger than the number of atoms in the Universe itself? The answer lies in random sampling, and I hinted to this approach when I wrote about timed warranties in software.

Building the random sample

Firstly, we need a controlled randomizer. The Random Java class fits the bill.

Thursday 30 July 2015

The power of documentation

This blog post is motivated by the fact that I’ve just published the latest draft of the JavaDoc documentation for the EnumJ library.

Why documenting?

An old wisdom in software says that we should document what we write because we still want to understand it six months later.

Very true: even the best written software can escape memory and well written documentation helps bring to the surface the necessary info about what was the original intent of the software we might stare at now. In the respect of documentation, Java really shines: the JavaDoc utility tool is very easy to use and it is nicely integrated with Maven and the major IDEs.

But there is one more reason for documenting thoroughly and that is … code review.

Self code reviews

The power of code reviews is well known – especially when accompanied by adequate tools.

Nevertheless, they still do suffer from unequal knowledge upon the code being reviewed: one developer is the author, the other one isn’t. It may happen that the suggestions of the uneducated reviewer are uninformed and, sometimes, of lesser value. Perhaps the best form of reviews are self-code reviews, when the developer reviews his/her own code – but in this case the whole process suffers from self-validating bias.

This is when documentation comes into play. If the developer documents critically what he/she has written, then the whole documentation process also becomes a code review – and a very powerful one.

For illustration purposes, I’ll show a few significant improvements I’ve implemented quickly over the past few days – all of them being a result of reflecting one more time upon the code while documenting.

OBS: this may require consulting the documentation mentioned above. If the next three sections seem too foreign, just read the titles and jump to Conclusions below.

Better ShareableEnumerator<E>

This is a really significant improvement. Before, SharingEnumerator<E> was a public class iterating over a list of SharedElementWrapper<E>. The whole thing was complex and error prone.

Now SharedElementWrapper<E> doesn’t exist and SharingEnumerator<E> is package-private. Moreover, ShareableEnumerator<E> uses internally a CachedEnumerable<E> which not only provides a simple mechanism for sharing (the shared buffer is the caching buffer) but the spawn sharing enumerators can act concurrently – something it was not possible before.

Saturday 25 July 2015

Cool about concurrency

In this short article I will show how trying to keep state consistency in the face of concurrent access may be detrimental while allowing a little bit of laxity on the matter may be beneficial, with no adverse effects.

The problem

Apache Commons has a nice little class for lazy initialisation aptly named LazyInitializer. We can clothe it in Java 8 garments in the following simple manner:


/**
* Type of {@link LazyInitializer} that gets a {@link Supplier} to initialise
* with.
* @param <T> type of lazily enumerated entity.
*/

public final class Lazy<T> extends LazyInitializer<T> {


   
private Supplier<T> supplier;

   
/**
     * Constructs a {@link Lazy} instance.
     * @param supplier {@link Supplier} instance providing the lazily
     * initialised element.
     */

   
public Lazy(Supplier<T> supplier) {

       
this.supplier = supplier;

   
}


   
@Override

   
public T get() {

       
try {

           
return super.get();

       
} catch(ConcurrentException ex) {

           
throw new UnsupportedOperationException(ex.getCause());

       
}

   
}

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

}

There may be scenarios where the function behind supplier may be very costly so we might want a way to know in advance whether the object has been initialized or not. If it’s been initialized, we call Lazy<T>.get() because it is cheap. If it hasn’t been initialized, we give up thus avoiding the expensive call.

Sunday 5 July 2015

Combiners

While working hard to make the EnumJ library neat and clean for it first release, I started another project: a library called CombineJ which it is meant to be both a demonstration and the first production-ready, industry-strength application of the power of EnumJ.

Quality is in the (boring) details

There is a difference between good and of high quality. There are good things everywhere but the things of high quality are more rare as high quality requires patience, discipline and lots and lots of attention to detail. And I want EnumJ to be of high quality.

I truly believe that quality stays in the details so high quality is hard to achieve - partly because chasing those details is boring. Necessary, but boring nevertheless.

Out of boredom of walking EnumJ out through the door, the new CombineJ library was born.

What is CombineJ?

It is a library that intends to show that, by combining enumerators via clever compositions administered in large numbers, we can obtain all kinds of interesting effects. I hinted to this when I wrote about enumerables.

CombineJ will be a library of enumerator combiners: units of enumerator composition with a certain purpose. People from the functional land might recognize those as monadic compositions, but I try to avoid that terminology because the intent is not to transform Java into a functional language.

I can foresee the following domains of application for combiners:

  • logic programming: where combiners underpin the solution sets defined by logic clauses in a manner similar to Prolog
  • constraint programming: same as above, but the solution sets are defined by constraint expressions instead of logic clauses
  • expert systems: where the whole job is to find a set of solutions to a problem. Backward chaining is quite natural to do, forward chaining might be easier than I can envision now
  • random walks with their applications in Finance and other domains: perhaps quite soon I’ll try my hand at constructing an exotic option pricer via such combiners – in a way similar to how people have been doing the same in Haskell and F# for some while
  • probabilistic programming: if back-propagation of beliefs is not involved, this actually might be easier than it sounds

For the moment, I started something in the direction of logic programming and so far so good, it looks rather well. It’s not going to be any threat to SWI-Prolog or SICStus – especially that I don’t have a unifier yet – but it’s a way to gauge how these enumerators might be useful for another purpose than just their own sake.