Map performance: wide and square

June 30, 2015

Map performance: wide and square

In my previous post I wrote about performance and scalability of EnumJ deep map which consists in a long and narrow pipeline processing a single element that undergoes many operations: up to 10,000 when comparing against Java 8 streams and up to 10,000,000 when gauging scalability.

Deep map is very illustrative as it shows that the EnumJ library, designed with high composability in mind, delivers well in that respect:

it is comparable to or faster than Java 8 streams for up to 10,000 compositions
its performance degrades linearly when going beyond what Java 8 streams can do, showing that the inner algorithms of EnumJ map() are O(N)

Deep map is not a realistic example, though: we rarely have a pipeline that narrow. A more realistic test is when the width and length of the pipeline are comparable. In other words, a square map is what we are looking for.

Before square map, let’s see how EnumJ map() fares on a wide pipeline.

Wide map

EnumJ wide map consists in an extremely wide and short pipeline: the number of elements processed is very large (up to 100,000,000) while the elements undergo a single map() operation.

Just like with concat() and deep map(), we have three kinds of tests:

creation tests – where we measure the time spent creating the pipeline
consumption tests – where we measure the times spent on consuming the pipeline
combined tests – where we measure creation and consumption as a single operation

The chart at the top shows how EnumJ wide map creation fares against the Java 8 wide map creation: they are very comparable, with a small advantage for streams. Wide map consumption offers a different picture: here streams are at clear advantage while the overhead of enumerator wide map may be 64 times higher than that of streams (see image below-left). Combined creation+consumption also gives advantage to streams: the overhead incurred by enumerators may be 54 times higher than that of Java 8 streams (see image below-right).

Square map

Wide map is still not a realistic example: we rarely have a performance-wise important pipeline with just one operation. We are interested to see what happens when the number of operations is somewhat correlated to the number of elements being processed. We take a square pipeline as benchmark; a square pipeline has a width equalling its length, meaning that the number of elements to process equals the number of operations that each element undergoes.

As Java 8 streams cannot bear more that 10,000 map() operations, the observation points will be 1², 10², 100², 1000², 2000², …, 9000² and 10,000² individual transformations. The tests remain the same: creation, consumption and combined creation+consumption.

Creation delivers almost identical performance on enumerators and streams (see chart to the right) especially beyond 100 elements or operations (i.e. 10,000 individual transformations).

Consumption gives a slight advantage to enumerators, especially beyond 200 elements and operations (i.e. 40,000 individual transformations). The chart to the bottom-left shows the comparison on consumption.

Finally, creation + consumption combined yield the most realistic picture. This test gives a small advantage to enumerators, in a more uniform manner. The chart to the bottom-right shows how enumerators and streams compare to each other on creation + consumption combined.

Square Map Enumerator-Stream Consumption Speed

Square Map Enumerator-Stream Combined Speed

Conclusion

The metrics of the last two posts showed several things about EnumJ enumerators, in what the map operation is concerned:

enumerators are comparable or slightly better in most of the scenarios
enumerators get more economical as the number of compositions grows
enumerators’ massive compositional scalability (well beyond what Java 8 streams can bear) keeps performance in the O(N) realm

For most scenarios, enumerators are a better option than sequential streams. Here are some situations when using Stream.map() is a better option:

the number of elements is far larger than the number of compositions. However, in such situations, sequential streams are a bad option anyhow – the programmer perhaps should consider using parallel streams
the latency must be so low that every nanosecond counts. In such situations, using streams is a bad idea to start with; these extreme cases need specially tailored and carefully profiled solutions – streams are too general to deliver properly when performance matters at nanosecond level

Search This Blog

My IT blog

Map performance: wide and square

Wide map

Square map

Conclusion

Comments

Post a Comment

Popular Posts

PeachOS

100%