Map performance: wide and square
In my previous post I wrote about performance and scalability of EnumJ deep map which consists in a long and narrow pipeline processing a single element that undergoes many operations: up to 10,000 when comparing against Java 8 streams and up to 10,000,000 when gauging scalability.
Deep map is very illustrative as it shows that the EnumJ library, designed with high composability in mind, delivers well in that respect:
- it is comparable to or faster than Java 8 streams for up to 10,000 compositions
- its performance degrades linearly when going beyond what Java 8 streams can do, showing that the inner algorithms of EnumJ map() are O(N)
Deep map is not a realistic example, though: we rarely have a pipeline that narrow. A more realistic test is when the width and length of the pipeline are comparable. In other words, a square map is what we are looking for.
Before square map, let’s see how EnumJ map() fares on a wide pipeline.
Wide map
EnumJ wide map consists in an extremely wide and short pipeline: the number of elements processed is very large (up to 100,000,000) while the elements undergo a single map() operation.
Just like with concat() and deep map(), we have three kinds of tests:
- creation tests – where we measure the time spent creating the pipeline
- consumption tests – where we measure the times spent on consuming the pipeline
- combined tests – where we measure creation and consumption as a single operation
The chart at the top shows how EnumJ wide map creation fares against the Java 8 wide map creation: they are very comparable, with a small advantage for streams. Wide map consumption offers a different picture: here streams are at clear advantage while the overhead of enumerator wide map may be 64 times higher than that of streams (see image below-left). Combined creation+consumption also gives advantage to streams: the overhead incurred by enumerators may be 54 times higher than that of Java 8 streams (see image below-right).
Square map
Wide map is still not a realistic example: we rarely have a performance-wise important pipeline with just one operation. We are interested to see what happens when the number of operations is somewhat correlated to the number of elements being processed. We take a square pipeline as benchmark; a square pipeline has a width equalling its length, meaning that the number of elements to process equals the number of operations that each element undergoes.
As Java 8 streams cannot bear more that 10,000 map() operations, the observation points will be 12, 102, 1002, 10002, 20002, …, 90002 and 10,0002 individual transformations. The tests remain the same: creation, consumption and combined creation+consumption.
Creation delivers almost identical performance on enumerators and streams (see chart to the right) especially beyond 100 elements or operations (i.e. 10,000 individual transformations).
Consumption gives a slight advantage to enumerators, especially beyond 200 elements and operations (i.e. 40,000 individual transformations). The chart to the bottom-left shows the comparison on consumption.
Finally, creation + consumption combined yield the most realistic picture. This test gives a small advantage to enumerators, in a more uniform manner. The chart to the bottom-right shows how enumerators and streams compare to each other on creation + consumption combined.
Conclusion
The metrics of the last two posts showed several things about EnumJ enumerators, in what the map operation is concerned:
- enumerators are comparable or slightly better in most of the scenarios
- enumerators get more economical as the number of compositions grows
- enumerators’ massive compositional scalability (well beyond what Java 8 streams can bear) keeps performance in the O(N) realm
For most scenarios, enumerators are a better option than sequential streams. Here are some situations when using Stream.map() is a better option:
- the number of elements is far larger than the number of compositions. However, in such situations, sequential streams are a bad option anyhow – the programmer perhaps should consider using parallel streams
- the latency must be so low that every nanosecond counts. In such situations, using streams is a bad idea to start with; these extreme cases need specially tailored and carefully profiled solutions – streams are too general to deliver properly when performance matters at nanosecond level
Comments
Post a Comment