My IT blog: Randomization is good

In this article I will show how I used random testing to check with a high level of confidence that the behaviour of EnumJ enumerators adhere to the expected behaviour as embodied by Java 8 streams (which enumerators emulate).

About this important step towards releasing EnumJ 1.0 I wrote in a previous post. I’ve just finished this step yesterday. But before jumping into details, let’s see a little bit of background.

The problem

There are many ways to look at testing, its ways and its goals. In this case I look at it in the following way:

an enumerator (or stream) is an entity with internal state
applying an operator on it alters its state
the outcome of the entity is a collection determined by the state that the entity has upon enumeration

So, we are testing two state machines: the one we trust (Java 8 streams) versus the one we check (EnumJ enumerators) and we expect the latter to follow the former. The test procedure is obvious:

put both entities (enumerator and stream) in the same initial state
apply on both the same operations – which supposedly will induce the same state transitions
extract the outcomes of both and compare. If the outcomes match then the final state of the enumerator is the correct one and from this we can infer that all the state transitions in between are correct, too

The bigger problem

As I mentioned earlier, Java 8 streams don’t scale well under compositions. They go up to between 1000 and 10000 compositions (depending on composition), beyond 10000 the risk of streams overflowing the stack is high. For this test we assume up to 1024 compositions.

Now let’s evaluate the size of the state space. We use 9 operations which yield the same type of object and are common to both streams and enumerators: distinct, filter, limit, map, peek, skip, sorted as well as concat and flatMap. This gives us 9¹⁰²⁴ possible final states and if we consider all the lengths from 0 to 1024 then it’s (9¹⁰²⁵-1)/8. If we take into account that the last two operations also make use of sub-streams (or sub-enumerators) which have their own sub-state spaces, then the total number of states grows beyond imagination.

How can we ensure that we traverse any significant sub-part of this enormous space which is perhaps larger than the number of atoms in the Universe itself? The answer lies in random sampling, and I hinted to this approach when I wrote about timed warranties in software.

Building the random sample

Firstly, we need a controlled randomizer. The Random Java class fits the bill.

Secondly, we need a length and a depth. Length is how many compositions we apply. We have length <= 1024. Depth is how many levels of sub-streaming we allow (only concat and flatMap induce sub-streaming). I chose 0, 1, 2, 4 and 8 for depth. Depth = 0 means no nesting, i.e. concat and flatMap get avoided.

Thirdly, we need a way to get pairs of stream/enumerator transformations at random. In other words, we need:

    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getFuns() {
        final int limit = 7 + (depth > 0 ? 2 : 0);
        switch(rnd.nextInt(limit)) {
            case 0: return getDistinctFuns();
            case 1: return getFilterFuns();
            case 2: return getLimitFuns();
            case 3: return getMapFuns();
            case 4: return getPeekFuns();
            case 5: return getSkipFuns();
            case 6: return getSortedFuns();
            /* deep */
            case 7: return getConcatFuns();
            case 8: return getFlatMapFuns();
            default: assert false; return null;
        }
    }

Based on this method, we can build our stream and enumerator with the build() function:

      public void build() {

        if (built) {
            return;
        }

        for(int i=0; i<length; ++i) {
            final Pair<Function<Stream<T>,Stream<T>>,
                       Function<E,E>> transf = getFuns();

            lhs = transf.getLeft().apply(lhs);
            rhs = transf.getRight().apply(rhs);
        }

        built = true;
    }

OBS: rhs stores the enumerator or the Enumerable<T> (here of type E), while lhs holds the Stream<T> instance.

With that in hand, testing is easy via the test(…, …) function:

    public void test(BiConsumer<T,Optional<T>> elemAssert,
                     Runnable                  lengthMismatchAsserter) {
        if (!built) {
            build();
        }

        lhs.forEach(x ->
            elemAssert.accept(x, Optional.ofNullable(rhsHasNext()
                                                     ? rhsNext()
                                                     : null))
        );
        if (rhsHasNext()) {
            lengthMismatchAsserter.run();
        }
    }

    protected abstract boolean rhsHasNext();
    protected abstract T       rhsNext();

There are other pieces related to sub-stream creation in pairs, specific composition methods, statistics and other details but I will jump now directly to how the tests look like. I add the full source code of StreamComparator<T,E> in an Appendix below.

How to test with random samples

Assuming we specialized out comparator class for Enumerator<Long>, then we need a utility function first:

    private void testLong(int maxDepth, int maxLength) {
        final LongEnumeratorComparator cmp =
                new LongEnumeratorComparator(
                        rnd,
                        maxDepth,
                        maxLength);
        cmp.statistics = statistics;
        cmp.test((x,y) -> {
                     assertTrue(y.isPresent());
                     assertEquals(x, y.get());
                 },
                 () -> assertTrue(false));
    }

Here both statistics and rnd get created on test startup and get initialized in predictable manner. If we want to repeat the experiment, we can:

    private void testLongRepeatedly(int repeats,
                                    int maxDepth,
                                    int maxLength) {
        for(int i=0; i<repeats; ++i) {
            testLong(maxDepth, maxLength);
        }
    }

The tests are now nearly one-liners:

   @Test
    public void testLongFlat() {
        System.out.println("testLongFlat");
        testLong(0, 1024);
        statistics.print();
    }

    @Test
    public void testLongFlatRepeatedly() {
        System.out.println("testLongFlatRepeatedly");
        testLongRepeatedly(512, 0, 1024);
        statistics.print();
    }

    @Test
    public void testLongDeep1() {
        System.out.println("testLongDeep1");
        testLong(1, 512);
        statistics.print();
    }

    @Test
    public void testLongDeep1Repeatedly() {
        System.out.println("testLongDeep1Repeatedly");
        testLongRepeatedly(4, 1, 512);
        statistics.print();
    }

    @Test
    public void testLongDeep2() {
        System.out.println("testLongDeep2");
        testLong(2, 128);
        statistics.print();
    }

… and so on …

The outputs

Question is, how much did we exercise the operations this way? Revealing the stats sheds some light:

Concurrency config is parallel='none', perCoreThreadCount=true, threadCount=2, useUnlimitedThreads=false
Running enumj.EnumerableStreamComparisonTest
testLongFlatRepeatedly
Statistics:
# concat   = 0
# distinct = 74969
# filter   = 74507
# flatMap = 0
# limit    = 75145
# map      = 74858
# peek     = 74718
# skip     = 75092
# sorted   = 74999
testLongDeep1Repeatedly
Statistics:
# concat   = 234
# distinct = 119043
# filter   = 107137
# flatMap = 209
# limit    = 108908
# map      = 108628
# peek     = 106569
# skip     = 116748
# sorted   = 110252
testLongDeep2Repeatedly
Statistics:
# concat   = 2285
# distinct = 303256
# filter   = 318668
# flatMap = 2157
# limit    = 295077
# map      = 241539
# peek     = 349283
# skip     = 277918
# sorted   = 269977
testLongDeep4Repeatedly
Statistics:
# concat   = 17611
# distinct = 199150
# filter   = 189327
# flatMap = 14727
# limit    = 183808
# map      = 194885
# peek     = 210561
# skip     = 201681
# sorted   = 245546
testLongDeep8Repeatedly
Statistics:
# concat   = 28479
# distinct = 110248
# filter   = 64899
# flatMap = 21019
# limit    = 76764
# map      = 114720
# peek     = 97656
# skip     = 136625
# sorted   = 89070
testLongFlat
Statistics:
# concat   = 0
# distinct = 153
# filter   = 129
# flatMap = 0
# limit    = 158
# map      = 147
# peek     = 146
# skip     = 150
# sorted   = 141
testLongDeep1
Statistics:
# concat   = 53
# distinct = 21733
# filter   = 19511
# flatMap = 49
# limit    = 17211
# map      = 16477
# peek     = 17366
# skip     = 18157
# sorted   = 19491
testLongDeep2
Statistics:
# concat   = 754
# distinct = 135450
# filter   = 125020
# flatMap = 989
# limit    = 115464
# map      = 95005
# peek     = 142130
# skip     = 115542
# sorted   = 129422
testLongDeep4
Statistics:
# concat   = 391
# distinct = 7643
# filter   = 5609
# flatMap = 579
# limit    = 6110
# map      = 6607
# peek     = 4456
# skip     = 9257
# sorted   = 5636
testLongDeep8
Statistics:
# concat   = 1816
# distinct = 5263
# filter   = 2211
# flatMap = 2092
# limit    = 9316
# map      = 10091
# peek     = 12688
# skip     = 13547
# sorted   = 7528
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.427 sec

In other words, these simple tests generated combinations of thousands, tens of thousands or hundreds of thousands of composed operations. It is an insignificant portion of the total state space, but it is highly relevant in the respect of ensuring that Enumerator<T> reliably follows Stream<T>.

Why it works

If these tests that compose the operations in the range of tens of thousands each and pass 100% are just an insignificant bit of the total state space, how can we say that we know with a high degree of confidence that Enumerator<T> follows Stream<T>?

The answer relies in the redundancy of the code-states relationship. Any piece of code, even a small one, has an astronomical number of states. This means, by necessity, that the code-state relationship is highly redundant on the side of states.

We have to keep in mind that we test running code, not states. This means that, because of the redundancy mentioned above, just a very small – but not as redundant - sample would do the job equally well. How do we choose the sample?

It’s obvious that the sample has to be random and uniformly distributed. The ideal distribution would follow the topology of the code (so that it has equal chances to reach any behaviour of any part of the code, no matter how small and obscure) but because this is extremely hard to achieve (it would equate to building a graph of inter-dependent random variables driven by code structure), then the next best approximation is to sample uniformly relative to the input space.

This is exactly the method we used.

Conclusion

When the behaviour of the system under test is clear (albeit complex) then other techniques are better suited than random testing.

But when we tackle state transitions resulting in exploring state spaces of astronomical sizes, then the random approach is a better bet: it delivers a high degree of confidence in face of gigantic searches, with little cost in testing time and development effort.

Appendix: StreamComparator<T,E>

public abstract class StreamComparator<T,E> {

    private final Random rnd;
    private final int    depth;
    private final int    length;

    public Stream<T> lhs;
    public E         rhs;
    private boolean   built;

    protected StreamComparator(Random rnd,
                               int    maxDepth,
                               int    maxLength) {
        assert rnd != null;
        assert maxDepth >= 0;
        assert maxLength > 0;

        this.rnd = rnd;
        this.depth = maxDepth;
        this.length = maxLength;

        this.lhs = lhs;
        this.rhs = rhs;
    }

    protected void rndSpin() {
        rndSpin(depth + length);
    }
    protected void rndSpin(int times) {
        assert times > 0;
        times = rnd.nextInt(times);
        while(times > 0) {
            rnd.nextInt();
            --times;
        }
    }

    // ---------------------------------------------------------------------- //

    public StreamComparatorStatistics statistics;

    public void build() {

        if (built) {
            return;
        }

        for(int i=0; i<length; ++i) {
            final Pair<Function<Stream<T>,Stream<T>>,
                       Function<E,E>> transf = getFuns();

            lhs = transf.getLeft().apply(lhs);
            rhs = transf.getRight().apply(rhs);
        }

        built = true;
    }

    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getFuns() {
        final int limit = 7 + (depth > 0 ? 2 : 0);
        switch(rnd.nextInt(limit)) {
            case 0: return getDistinctFuns();
            case 1: return getFilterFuns();
            case 2: return getLimitFuns();
            case 3: return getMapFuns();
            case 4: return getPeekFuns();
            case 5: return getSkipFuns();
            case 6: return getSortedFuns();
            /* deep */
            case 7: return getConcatFuns();
            case 8: return getFlatMapFuns();
            default: assert false; return null;
        }
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getDistinctFuns() {
        final Function<Stream<T>,Stream<T>> lhs = s -> s.distinct();
        final Function<E,E> rhs = e -> distinct(e);
        if (statistics != null) {
            statistics.distinct();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getFilterFuns() {
        final int seed = rnd.nextInt();
        final Predicate<T> filter = predicateOfSeed.apply(seed);
        final Function<Stream<T>,Stream<T>> lhs = s -> s.filter(filter);
        final Function<E,E> rhs = e -> filter(e, filter);
        if (statistics != null) {
            statistics.filter();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getLimitFuns() {
        final int seed = rnd.nextInt();
        final long limit = limitOfSeed.apply(seed);
        final Function<Stream<T>,Stream<T>> lhs = s -> s.limit(limit);
        final Function<E,E> rhs = e -> limit(e, limit);
        if (statistics != null) {
            statistics.limit();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getMapFuns() {
        final int seed = rnd.nextInt();
        final Function<T,T> mapper = mapperOfSeed.apply(seed);
        final Function<Stream<T>,Stream<T>> lhs = s -> s.map(mapper);
        final Function<E,E> rhs = e -> map(e, mapper);
        if (statistics != null) {
            statistics.map();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getPeekFuns() {
        final int seed = rnd.nextInt();
        final Consumer<T> peeker = peekConsumerOfSeed.apply(seed);
        final Function<Stream<T>,Stream<T>> lhs = s -> s.peek(peeker);
        final Function<E,E> rhs = e -> peek(e, peeker);
        if (statistics != null) {
            statistics.peek();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getSkipFuns() {
        final int seed = rnd.nextInt();
        final long n = this.skipLimitOfSeed.apply(seed);
        final Function<Stream<T>,Stream<T>> lhs = s -> s.skip(n);
        final Function<E,E> rhs = e -> skip(e, n);
        if (statistics != null) {
            statistics.skip();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getSortedFuns() {
        final Function<Stream<T>,Stream<T>> lhs = s -> s.sorted();
        final Function<E,E> rhs = e -> sorted(e);
        if (statistics != null) {
            statistics.sorted();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getConcatFuns() {
        rndSpin();
        final StreamComparator<T,E> sub =
                subComparator(new Random(rnd.nextInt()), depth-1, length);
        sub.statistics = statistics;
        sub.build();
        rndSpin();

        final Function<Stream<T>,Stream<T>> lhs = s ->
                Stream.concat(s.limit(length/2), sub.lhs.skip(length/2));
        final Function<E,E> rhs = e ->
                concat(limit(e, length/2), skip(sub.rhs, length/2));
        if (statistics != null) {
            statistics.concat();
        }
        return Pair.of(lhs, rhs);
    }
    private Pair<Function<Stream<T>,Stream<T>>,
                 Function<E,E>> getFlatMapFuns() {
        rndSpin();
        final int seed = rnd.nextInt();
        final Function<Stream<T>,Stream<T>> lhs = s -> s.flatMap(
                x -> {
                    final StreamComparator<T,E> sub =
                            subComparator(new Random(seed),
                                          depth-1,
                                          length);
                    sub.statistics = statistics;
                    sub.build();
                    return sub.lhs.limit(2);
                });
        final Function<E,E> rhs = e -> flatMap(e,
                x -> {
                    final StreamComparator<T,E> sub =
                            subComparator(new Random(seed),
                                          depth-1,
                                          length);
                    sub.build();
                    return limit(sub.rhs, 2);
                });
        rndSpin();
        if (statistics != null) {
            statistics.flatMap();
        }
        return Pair.of(lhs, rhs);
    }

    // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - //

    public void test(BiConsumer<T,Optional<T>> elemAssert,
                     Runnable                  lengthMismatchAsserter) {
        if (!built) {
            build();
        }

        lhs.forEach(x ->
            elemAssert.accept(x, Optional.ofNullable(rhsHasNext()
                                                     ? rhsNext()
                                                     : null))
        );
        if (rhsHasNext()) {
            lengthMismatchAsserter.run();
        }
    }

    protected abstract boolean rhsHasNext();
    protected abstract T       rhsNext();

    // ---------------------------------------------------------------------- //

    public IntFunction<Predicate<T>> predicateOfSeed;
    public IntFunction<Long>          limitOfSeed;
    public IntFunction<Function<T,T>> mapperOfSeed;
    public IntFunction<Consumer<T>>   peekConsumerOfSeed;
    public IntFunction<Long>          skipLimitOfSeed;

    protected abstract StreamComparator<T,E> subComparator(
            Random rnd,
            int maxDepth,
            int maxLength);

    protected abstract E concat(E first, E second);
    protected abstract E distinct(E en);
    protected abstract E filter(E en, Predicate<T> filter);
    protected abstract E flatMap(E en, Function<T,E> flatMap);
    protected abstract E limit(E en, long limit);
    protected abstract E map(E en, Function<T,T> mapper);
    protected abstract E peek(E en, Consumer<T> peekConsumer);
    protected abstract E skip(E en, long n);
    protected abstract E sorted(E en);
}

My IT blog

Friday 31 July 2015

Randomization is good

The problem

The bigger problem

Building the random sample

How to test with random samples

The outputs

Why it works

Conclusion

Appendix: StreamComparator<T,E>

No comments:

Post a Comment

Labels

Blog Archive