In this article I will show how I used random testing to check with a high level of confidence that the behaviour of EnumJ enumerators adhere to the expected behaviour as embodied by Java 8 streams (which enumerators emulate).
About this important step towards releasing EnumJ 1.0 I wrote in a previous post. I’ve just finished this step yesterday. But before jumping into details, let’s see a little bit of background.
The problem
There are many ways to look at testing, its ways and its goals. In this case I look at it in the following way:
- an enumerator (or stream) is an entity with internal state
- applying an operator on it alters its state
- the outcome of the entity is a collection determined by the state that the entity has upon enumeration
So, we are testing two state machines: the one we trust (Java 8 streams) versus the one we check (EnumJ enumerators) and we expect the latter to follow the former. The test procedure is obvious:
- put both entities (enumerator and stream) in the same initial state
- apply on both the same operations – which supposedly will induce the same state transitions
- extract the outcomes of both and compare. If the outcomes match then the final state of the enumerator is the correct one and from this we can infer that all the state transitions in between are correct, too
The bigger problem
As I mentioned earlier, Java 8 streams don’t scale well under compositions. They go up to between 1000 and 10000 compositions (depending on composition), beyond 10000 the risk of streams overflowing the stack is high. For this test we assume up to 1024 compositions.
Now let’s evaluate the size of the state space. We use 9 operations which yield the same type of object and are common to both streams and enumerators: distinct, filter, limit, map, peek, skip, sorted as well as concat and flatMap. This gives us 91024 possible final states and if we consider all the lengths from 0 to 1024 then it’s (91025-1)/8. If we take into account that the last two operations also make use of sub-streams (or sub-enumerators) which have their own sub-state spaces, then the total number of states grows beyond imagination.
How can we ensure that we traverse any significant sub-part of this enormous space which is perhaps larger than the number of atoms in the Universe itself? The answer lies in random sampling, and I hinted to this approach when I wrote about timed warranties in software.
Building the random sample
Firstly, we need a controlled randomizer. The Random Java class fits the bill.