Tuesday 30 June 2015

Map performance: wide and square

Wide Map Enumerator-Stream Creation Speed

In my previous post I wrote about performance and scalability of EnumJ deep map which consists in a long and narrow pipeline processing a single element that undergoes many operations: up to 10,000 when comparing against Java 8 streams and up to 10,000,000 when gauging scalability.

Deep map is very illustrative as it shows that the EnumJ library, designed with high composability in mind, delivers well in that respect:

  • it is comparable to or faster than Java 8 streams for up to 10,000 compositions
  • its performance degrades linearly when going beyond what Java 8 streams can do, showing that the inner algorithms of EnumJ map() are O(N)

Deep map is not a realistic example, though: we rarely have a pipeline that narrow. A more realistic test is when the width and length of the pipeline are comparable. In other words, a square map is what we are looking for.

Before square map, let’s see how EnumJ map() fares on a wide pipeline.

Wide map

EnumJ wide map consists in an extremely wide and short pipeline: the number of elements processed is very large (up to 100,000,000) while the elements undergo a single map() operation.

Just like with concat() and deep map(), we have three kinds of tests:

  • creation tests – where we measure the time spent creating the pipeline
  • consumption tests – where we measure the times spent on consuming the pipeline
  • combined tests – where we measure creation and consumption as a single operation

The chart at the top shows how EnumJ wide map creation fares against the Java 8 wide map creation: they are very comparable, with a small advantage for streams. Wide map consumption offers a different picture: here streams are at clear advantage while the overhead of enumerator wide map may be 64 times higher than that of streams (see image below-left). Combined creation+consumption also gives advantage to streams: the overhead incurred by enumerators may be 54 times higher than that of Java 8 streams (see image below-right).

Wide Map Enumerator-Stream Combined Speed

Wide Map Enumerator-Stream Consumption Speed


Square map

Square Map Enumerator-Stream Creation SpeedWide map is still not a realistic example: we rarely have a performance-wise important pipeline with just one operation. We are interested to see what happens when the number of operations is somewhat correlated to the number of elements being processed. We take a square pipeline as benchmark; a square pipeline has a width equalling its length, meaning that the number of elements to process equals the number of operations that each element undergoes.

As Java 8 streams cannot bear more that 10,000 map() operations, the observation points will be 12, 102, 1002, 10002, 20002, …, 90002 and 10,0002 individual transformations. The tests remain the same: creation, consumption and combined creation+consumption.

Creation delivers almost identical performance on enumerators and streams (see chart to the right) especially beyond 100 elements or operations (i.e. 10,000 individual transformations).

Consumption gives a slight advantage to enumerators, especially beyond 200 elements and operations (i.e. 40,000 individual transformations). The chart to the bottom-left shows the comparison on consumption.

Finally, creation + consumption combined yield the most realistic picture. This test gives a small advantage to enumerators, in a more uniform manner. The chart to the bottom-right shows how enumerators and streams compare to each other on creation + consumption combined.

Square Map Enumerator-Stream Consumption SpeedSquare Map Enumerator-Stream Combined Speed

Conclusion

The metrics of the last two posts showed several things about EnumJ enumerators, in what the map operation is concerned:

  • enumerators are comparable or slightly better in most of the scenarios
  • enumerators get more economical as the number of compositions grows
  • enumerators’ massive compositional scalability (well beyond what Java 8 streams can bear) keeps performance in the O(N) realm

For most scenarios, enumerators are a better option than sequential streams. Here are some situations when using Stream.map() is a better option:

  • the number of elements is far larger than the number of compositions. However, in such situations, sequential streams are a bad option anyhow – the programmer perhaps should consider using parallel streams
  • the latency must be so low that every nanosecond counts. In such situations, using streams is a bad idea to start with; these extreme cases need specially tailored and carefully profiled solutions – streams are too general to deliver properly when performance matters at nanosecond level

Tuesday 23 June 2015

Performance and scalability: deep map

Deep Map Enumerator-Stream Creation SpeedThe performance measurements inaugurated in the previous post continue. Now is the time to check the performance and scalability of deep map().

What is deep map?

It is looking at the performance of map() composed many, many times (millions of compositions) while keeping the source very small (one single element).

Needless to say, Java 8 streams do not exhibit deep map. While someone can compose many map() operations on a stream, trying to extract elements from it will yield a stack overflow if the number of compositions exceeds 10000 or so.

Comparison to Java 8 streams

Enumerators fare quite well when compared to Java 8 streams. While performance can be much worse or much better, in general it matches the speed of streams with a factor of 100-130%. The image at the top shows the comparison on pipeline creation – streams and enumerators behave very comparably, with a slight advantage for streams.

Deep Map Enumerator-Stream Combined SpeedDeep Map Enumerator-Stream Consumption SpeedHere are the charts for element consumption and for creation+element consumption.

Again, performance of the two are quite similar, with a slight advantage for enumerators when it comes to combined speed (creation of the pipeline + extraction of elements).


Scalability

Deep Map Enumerator Scalable Creation and Consumption SpeedsEnumerators truly shine when scalability comes into play. The left chart shows the behaviour on creation and consumption when the number of map() Deep Map Enumerator Scalable Combined Speedcompositions grows into the millions: performance per composition stays flat or varies within limits with the result that the total time stays flat or grows linearly.

The stability of enumerator’s behaviour is more clear when creation and consumption are measured as a single operation (image to the right).

Here the performance per composition stays relatively flat which makes time grow approximately linearly.

Conclusion

Just as with concatenations, map() over enumerators shows two characteristics:

  • performance comparable to or better than that of streams for less than 10000 compositions
  • performance growing linearly for large-scale compositions – a domain inaccessible to Java 8 streams

This makes map() an operation suitable for constructing large, complex computations that operate over stream-like flows of data.

Sunday 21 June 2015

Performance and scalability: concat

Concat Enumerator-Stream Creation SpeedI started working on the performance of EnumJ library, as I wrote in my previous post. The term of comparison is the Java 8 sequential streams despite the fact that Java 8 streams are not highly composable.

How the performance tests look like

The test pattern is simple:

  1. create a simple enumerator/stream
  2. modify the enumerator/stream by applying the same operation repeatedly – in our case concat() 
  3. consume the elements from the enumerator/stream via forEach()

I took great care to choose only extremely simple operations so that the statistics do not get distorted by extraneous operations: the original enumerator/stream contains one element only, the same for the concatenated enumerators/streams and no allocation takes place: the enumerated element is always the same constant string.

Considering the above operations, there are three types of tests derived from the steps above:

  1. construction tests – they measure only steps 1) and 2) from above
  2. consumption tests – they apply stepts 1) and 2) and then measure only 3)
  3. combined (both) tests – they measure 1), 2) and 3) as a single operation

Regarding the number of operations applied, the tests fall in two categories:

  1. comparison tests – less than 10000 concatenations, where enumerators can be compared against Java 8 streams
  2. scalability tests – where enumerators go beyond what streams can handle (from 10000 to 10 million concatenations)

Creation speed

Concat Enumerator Scalable Creation SpeedWhen comparing the speed of creating the pipeline, enumerators are clearly the winner: their creation time compared to Java 8 streams is between under 50% and 0.2% (see the chart above). Moreover, their performance improves as the number of concatenations grows – as expected, they have been designed with scalability in mind.

Going beyond the limit of 10000 concatenations,  which Java 8 streams cannot handle, the statistics look good, too: performance stays low and flat up to 5 million concatenations and then it degrades somewhat linearly (clearly non-exponentially) – see the image to the right.

Tuesday 9 June 2015

100%

The EnumJ library has 100% code coverage again (according to the JaCoCo tool for Java 8). I used to be there in February, but because of many useful changes (like this, this, this, this and this and this and this) the percentage decreased along the way. It feels good to be back.

Still, not enough

Having 100% coverage does not mean the software is perfect. In fact, there is no such thing as perfect software as a “perfect program” would theoretically require an infinite succession of error blocks. Nevertheless, even for practical practical purposes 100% code coverage is not enough:

  • code coverage usually means block coverage. Arc coverage, path coverage, condition coverage and method coverage are important metrics, too.
  • code coverage doesn’t tell anything about missing features
  • code coverage doesn’t tell anything about robustness, long-haul or scalability

The last point is very important for EnumJ.

What is next?