Sunday, 13 December 2015

Freedcamp

I started using Freedcamp. It’s not JIRA (by any means) but it provides enough features for a good start.

Sunday, 27 September 2015

Test-Driven Development: why it works and why it doesn’t

A few weeks ago I took part in a small hands-on workshop introducing test-driven development. The workshop consisted in a short presentation followed by a practical exercise: to implement a small function the TDD way.

There was a laptop and we (the participants) would take turns to advance one test-code-refactor cycle at a time. Some sort of pair programming was in place.

The ultimate goal was to implement a function which, given a number, returns the list of its prime factors:

0 –> [], 1 –> [], 2 –> [2], 3 –> [3], 4 –> [2, 2], 5 –> [5], 6 –> [2, 3], 7 –> [7], 8 –> [2, 2, 2], 9 –> [3, 3],  …, 12 –> [2, 2, 3], …

Disclaimer

In this article I assume basic knowledge on TDD – especially the paradigm that “TDD is not same as test-first”.

The first glitch: choosing the test cases

It’s not hard to see that not all the test cases are created equal. The “fewer” prime factors a number has, the less information it reveals, hence the less useful for the TDD approach that number is. If we tried only prime numbers as test cases: 2, 3, 5, 7, 11, 13, … then our TDD-derived function would have the following glorious but utterly useless form (I write the code here in Java, we used C# in the workshop):

List<Integer> factors(int n) {
 
final List<Integer> result = new LinkedList<>();
 
if (n > 1) {
   
result.add(n);
 
}
 
return result;
}

Obviously, the workshop stayed away from such “oddities” and conveniently chose a set of relevant test cases: 2, 3, 4, 6, 10 and 12 (if I recollect correctly). The question is: why choosing those convenient test cases?

The second glitch: the immanence of “intelligent design”

Sunday, 6 September 2015

EnumJ is open source

I’ve just open-sourced the EnumJ library to GitHub. It is still SNAPSHOT version as I am in the process of publishing the binary to Maven Central.

The JavaDocs are in the usual place.

Feedback is welcome.

Friday, 31 July 2015

Randomization is good

In this article I will show how I used random testing to check with a high level of confidence that the behaviour of EnumJ enumerators adhere to the expected behaviour as embodied by Java 8 streams (which enumerators emulate).

About this important step towards releasing EnumJ 1.0 I wrote in a previous post. I’ve just finished this step yesterday. But before jumping into details, let’s see a little bit of background.

The problem

There are many ways to look at testing, its ways and its goals. In this case I look at it in the following way:

  • an enumerator (or stream) is an entity with internal state
  • applying an operator on it alters its state
  • the outcome of the entity is a collection determined by the state that the entity has upon enumeration

So, we are testing two state machines: the one we trust (Java 8 streams) versus the one we check (EnumJ enumerators) and we expect the latter to follow the former. The test procedure is obvious:

  • put both entities (enumerator and stream) in the same initial state
  • apply on both the same operations – which supposedly will induce the same state transitions
  • extract the outcomes of both and compare. If the outcomes match then the final state of the enumerator is the correct one and from this we can infer that all the state transitions in between are correct, too

The bigger problem

As I mentioned earlier, Java 8 streams don’t scale well under compositions. They go up to between 1000 and 10000 compositions (depending on composition), beyond 10000 the risk of streams overflowing the stack is high. For this test we assume up to 1024 compositions.

Now let’s evaluate the size of the state space. We use 9 operations which yield the same type of object and are common to both streams and enumerators: distinct, filter, limit, map, peek, skip, sorted as well as concat and flatMap. This gives us 91024 possible final states and if we consider all the lengths from 0 to 1024 then it’s (91025-1)/8. If we take into account that the last two operations also make use of sub-streams (or sub-enumerators) which have their own sub-state spaces, then the total number of states grows beyond imagination.

How can we ensure that we traverse any significant sub-part of this enormous space which is perhaps larger than the number of atoms in the Universe itself? The answer lies in random sampling, and I hinted to this approach when I wrote about timed warranties in software.

Building the random sample

Firstly, we need a controlled randomizer. The Random Java class fits the bill.

Thursday, 30 July 2015

The power of documentation

This blog post is motivated by the fact that I’ve just published the latest draft of the JavaDoc documentation for the EnumJ library.

Why documenting?

An old wisdom in software says that we should document what we write because we still want to understand it six months later.

Very true: even the best written software can escape memory and well written documentation helps bring to the surface the necessary info about what was the original intent of the software we might stare at now. In the respect of documentation, Java really shines: the JavaDoc utility tool is very easy to use and it is nicely integrated with Maven and the major IDEs.

But there is one more reason for documenting thoroughly and that is … code review.

Self code reviews

The power of code reviews is well known – especially when accompanied by adequate tools.

Nevertheless, they still do suffer from unequal knowledge upon the code being reviewed: one developer is the author, the other one isn’t. It may happen that the suggestions of the uneducated reviewer are uninformed and, sometimes, of lesser value. Perhaps the best form of reviews are self-code reviews, when the developer reviews his/her own code – but in this case the whole process suffers from self-validating bias.

This is when documentation comes into play. If the developer documents critically what he/she has written, then the whole documentation process also becomes a code review – and a very powerful one.

For illustration purposes, I’ll show a few significant improvements I’ve implemented quickly over the past few days – all of them being a result of reflecting one more time upon the code while documenting.

OBS: this may require consulting the documentation mentioned above. If the next three sections seem too foreign, just read the titles and jump to Conclusions below.

Better ShareableEnumerator<E>

This is a really significant improvement. Before, SharingEnumerator<E> was a public class iterating over a list of SharedElementWrapper<E>. The whole thing was complex and error prone.

Now SharedElementWrapper<E> doesn’t exist and SharingEnumerator<E> is package-private. Moreover, ShareableEnumerator<E> uses internally a CachedEnumerable<E> which not only provides a simple mechanism for sharing (the shared buffer is the caching buffer) but the spawn sharing enumerators can act concurrently – something it was not possible before.

Saturday, 25 July 2015

Cool about concurrency

In this short article I will show how trying to keep state consistency in the face of concurrent access may be detrimental while allowing a little bit of laxity on the matter may be beneficial, with no adverse effects.

The problem

Apache Commons has a nice little class for lazy initialisation aptly named LazyInitializer. We can clothe it in Java 8 garments in the following simple manner:


/**
* Type of {@link LazyInitializer} that gets a {@link Supplier} to initialise
* with.
* @param <T> type of lazily enumerated entity.
*/

public final class Lazy<T> extends LazyInitializer<T> {


   
private Supplier<T> supplier;

   
/**
     * Constructs a {@link Lazy} instance.
     * @param supplier {@link Supplier} instance providing the lazily
     * initialised element.
     */

   
public Lazy(Supplier<T> supplier) {

       
this.supplier = supplier;

   
}


   
@Override

   
public T get() {

       
try {

           
return super.get();

       
} catch(ConcurrentException ex) {

           
throw new UnsupportedOperationException(ex.getCause());

       
}

   
}

   
@Override

   
protected T initialize() throws ConcurrentException {

       
try {

           
final T result = supplier.get();

           
supplier = null;

           
return result;

       
} catch(LazyException ex) {

           
throw new ConcurrentException(ex.getCause());

       
}

   
}

}

There may be scenarios where the function behind supplier may be very costly so we might want a way to know in advance whether the object has been initialized or not. If it’s been initialized, we call Lazy<T>.get() because it is cheap. If it hasn’t been initialized, we give up thus avoiding the expensive call.

Sunday, 5 July 2015

Combiners

While working hard to make the EnumJ library neat and clean for it first release, I started another project: a library called CombineJ which it is meant to be both a demonstration and the first production-ready, industry-strength application of the power of EnumJ.

Quality is in the (boring) details

There is a difference between good and of high quality. There are good things everywhere but the things of high quality are more rare as high quality requires patience, discipline and lots and lots of attention to detail. And I want EnumJ to be of high quality.

I truly believe that quality stays in the details so high quality is hard to achieve - partly because chasing those details is boring. Necessary, but boring nevertheless.

Out of boredom of walking EnumJ out through the door, the new CombineJ library was born.

What is CombineJ?

It is a library that intends to show that, by combining enumerators via clever compositions administered in large numbers, we can obtain all kinds of interesting effects. I hinted to this when I wrote about enumerables.

CombineJ will be a library of enumerator combiners: units of enumerator composition with a certain purpose. People from the functional land might recognize those as monadic compositions, but I try to avoid that terminology because the intent is not to transform Java into a functional language.

I can foresee the following domains of application for combiners:

  • logic programming: where combiners underpin the solution sets defined by logic clauses in a manner similar to Prolog
  • constraint programming: same as above, but the solution sets are defined by constraint expressions instead of logic clauses
  • expert systems: where the whole job is to find a set of solutions to a problem. Backward chaining is quite natural to do, forward chaining might be easier than I can envision now
  • random walks with their applications in Finance and other domains: perhaps quite soon I’ll try my hand at constructing an exotic option pricer via such combiners – in a way similar to how people have been doing the same in Haskell and F# for some while
  • probabilistic programming: if back-propagation of beliefs is not involved, this actually might be easier than it sounds

For the moment, I started something in the direction of logic programming and so far so good, it looks rather well. It’s not going to be any threat to SWI-Prolog or SICStus – especially that I don’t have a unifier yet – but it’s a way to gauge how these enumerators might be useful for another purpose than just their own sake.

Tuesday, 30 June 2015

Map performance: wide and square

Wide Map Enumerator-Stream Creation Speed

In my previous post I wrote about performance and scalability of EnumJ deep map which consists in a long and narrow pipeline processing a single element that undergoes many operations: up to 10,000 when comparing against Java 8 streams and up to 10,000,000 when gauging scalability.

Deep map is very illustrative as it shows that the EnumJ library, designed with high composability in mind, delivers well in that respect:

  • it is comparable to or faster than Java 8 streams for up to 10,000 compositions
  • its performance degrades linearly when going beyond what Java 8 streams can do, showing that the inner algorithms of EnumJ map() are O(N)

Deep map is not a realistic example, though: we rarely have a pipeline that narrow. A more realistic test is when the width and length of the pipeline are comparable. In other words, a square map is what we are looking for.

Before square map, let’s see how EnumJ map() fares on a wide pipeline.

Wide map

EnumJ wide map consists in an extremely wide and short pipeline: the number of elements processed is very large (up to 100,000,000) while the elements undergo a single map() operation.

Just like with concat() and deep map(), we have three kinds of tests:

  • creation tests – where we measure the time spent creating the pipeline
  • consumption tests – where we measure the times spent on consuming the pipeline
  • combined tests – where we measure creation and consumption as a single operation

The chart at the top shows how EnumJ wide map creation fares against the Java 8 wide map creation: they are very comparable, with a small advantage for streams. Wide map consumption offers a different picture: here streams are at clear advantage while the overhead of enumerator wide map may be 64 times higher than that of streams (see image below-left). Combined creation+consumption also gives advantage to streams: the overhead incurred by enumerators may be 54 times higher than that of Java 8 streams (see image below-right).

Wide Map Enumerator-Stream Combined Speed

Wide Map Enumerator-Stream Consumption Speed


Square map

Square Map Enumerator-Stream Creation SpeedWide map is still not a realistic example: we rarely have a performance-wise important pipeline with just one operation. We are interested to see what happens when the number of operations is somewhat correlated to the number of elements being processed. We take a square pipeline as benchmark; a square pipeline has a width equalling its length, meaning that the number of elements to process equals the number of operations that each element undergoes.

As Java 8 streams cannot bear more that 10,000 map() operations, the observation points will be 12, 102, 1002, 10002, 20002, …, 90002 and 10,0002 individual transformations. The tests remain the same: creation, consumption and combined creation+consumption.

Creation delivers almost identical performance on enumerators and streams (see chart to the right) especially beyond 100 elements or operations (i.e. 10,000 individual transformations).

Consumption gives a slight advantage to enumerators, especially beyond 200 elements and operations (i.e. 40,000 individual transformations). The chart to the bottom-left shows the comparison on consumption.

Finally, creation + consumption combined yield the most realistic picture. This test gives a small advantage to enumerators, in a more uniform manner. The chart to the bottom-right shows how enumerators and streams compare to each other on creation + consumption combined.

Square Map Enumerator-Stream Consumption SpeedSquare Map Enumerator-Stream Combined Speed

Conclusion

The metrics of the last two posts showed several things about EnumJ enumerators, in what the map operation is concerned:

  • enumerators are comparable or slightly better in most of the scenarios
  • enumerators get more economical as the number of compositions grows
  • enumerators’ massive compositional scalability (well beyond what Java 8 streams can bear) keeps performance in the O(N) realm

For most scenarios, enumerators are a better option than sequential streams. Here are some situations when using Stream.map() is a better option:

  • the number of elements is far larger than the number of compositions. However, in such situations, sequential streams are a bad option anyhow – the programmer perhaps should consider using parallel streams
  • the latency must be so low that every nanosecond counts. In such situations, using streams is a bad idea to start with; these extreme cases need specially tailored and carefully profiled solutions – streams are too general to deliver properly when performance matters at nanosecond level

Tuesday, 23 June 2015

Performance and scalability: deep map

Deep Map Enumerator-Stream Creation SpeedThe performance measurements inaugurated in the previous post continue. Now is the time to check the performance and scalability of deep map().

What is deep map?

It is looking at the performance of map() composed many, many times (millions of compositions) while keeping the source very small (one single element).

Needless to say, Java 8 streams do not exhibit deep map. While someone can compose many map() operations on a stream, trying to extract elements from it will yield a stack overflow if the number of compositions exceeds 10000 or so.

Comparison to Java 8 streams

Enumerators fare quite well when compared to Java 8 streams. While performance can be much worse or much better, in general it matches the speed of streams with a factor of 100-130%. The image at the top shows the comparison on pipeline creation – streams and enumerators behave very comparably, with a slight advantage for streams.

Deep Map Enumerator-Stream Combined SpeedDeep Map Enumerator-Stream Consumption SpeedHere are the charts for element consumption and for creation+element consumption.

Again, performance of the two are quite similar, with a slight advantage for enumerators when it comes to combined speed (creation of the pipeline + extraction of elements).


Scalability

Deep Map Enumerator Scalable Creation and Consumption SpeedsEnumerators truly shine when scalability comes into play. The left chart shows the behaviour on creation and consumption when the number of map() Deep Map Enumerator Scalable Combined Speedcompositions grows into the millions: performance per composition stays flat or varies within limits with the result that the total time stays flat or grows linearly.

The stability of enumerator’s behaviour is more clear when creation and consumption are measured as a single operation (image to the right).

Here the performance per composition stays relatively flat which makes time grow approximately linearly.

Conclusion

Just as with concatenations, map() over enumerators shows two characteristics:

  • performance comparable to or better than that of streams for less than 10000 compositions
  • performance growing linearly for large-scale compositions – a domain inaccessible to Java 8 streams

This makes map() an operation suitable for constructing large, complex computations that operate over stream-like flows of data.

Sunday, 21 June 2015

Performance and scalability: concat

Concat Enumerator-Stream Creation SpeedI started working on the performance of EnumJ library, as I wrote in my previous post. The term of comparison is the Java 8 sequential streams despite the fact that Java 8 streams are not highly composable.

How the performance tests look like

The test pattern is simple:

  1. create a simple enumerator/stream
  2. modify the enumerator/stream by applying the same operation repeatedly – in our case concat() 
  3. consume the elements from the enumerator/stream via forEach()

I took great care to choose only extremely simple operations so that the statistics do not get distorted by extraneous operations: the original enumerator/stream contains one element only, the same for the concatenated enumerators/streams and no allocation takes place: the enumerated element is always the same constant string.

Considering the above operations, there are three types of tests derived from the steps above:

  1. construction tests – they measure only steps 1) and 2) from above
  2. consumption tests – they apply stepts 1) and 2) and then measure only 3)
  3. combined (both) tests – they measure 1), 2) and 3) as a single operation

Regarding the number of operations applied, the tests fall in two categories:

  1. comparison tests – less than 10000 concatenations, where enumerators can be compared against Java 8 streams
  2. scalability tests – where enumerators go beyond what streams can handle (from 10000 to 10 million concatenations)

Creation speed

Concat Enumerator Scalable Creation SpeedWhen comparing the speed of creating the pipeline, enumerators are clearly the winner: their creation time compared to Java 8 streams is between under 50% and 0.2% (see the chart above). Moreover, their performance improves as the number of concatenations grows – as expected, they have been designed with scalability in mind.

Going beyond the limit of 10000 concatenations,  which Java 8 streams cannot handle, the statistics look good, too: performance stays low and flat up to 5 million concatenations and then it degrades somewhat linearly (clearly non-exponentially) – see the image to the right.

Tuesday, 9 June 2015

100%

The EnumJ library has 100% code coverage again (according to the JaCoCo tool for Java 8). I used to be there in February, but because of many useful changes (like this, this, this, this and this and this and this) the percentage decreased along the way. It feels good to be back.

Still, not enough

Having 100% coverage does not mean the software is perfect. In fact, there is no such thing as perfect software as a “perfect program” would theoretically require an infinite succession of error blocks. Nevertheless, even for practical practical purposes 100% code coverage is not enough:

  • code coverage usually means block coverage. Arc coverage, path coverage, condition coverage and method coverage are important metrics, too.
  • code coverage doesn’t tell anything about missing features
  • code coverage doesn’t tell anything about robustness, long-haul or scalability

The last point is very important for EnumJ.

What is next?

Friday, 29 May 2015

NetBeans

I’ve just returned from the first NetBeans Day conference in London and I have some impressions to share.

A little bit of background

I am a happy user of NetBeans and I’ve been for quite some while. I use it routinely to develop my personal projects (EnumJ is one and there is another one that I don’t disclose yet) and this last conference opened new doors.
Disclaimer: the dispute “which editor is best” is a type of religious war that rarely brings any good. So, my intent here is not such a dispute. I do not believe in relativism, I don’t consider that all things are the same or that all views are of equal value. However, the choice of editor really does not make a difference: the best editor or IDE for a developer is the one that makes him or her happy and productive.
That being said, I’ll tell why NetBeans makes me happy.

Why NetBeans

My love affair with IDEs started with Turbo Pascal 2.0, more than two decades ago. Over the years I’ve used for Java both IntelliJ and Eclipse. For Python I used Eclipse and the most charming PyCharm.

I also love Vim and I’ve never used Emacs. I’ve been, for more than a decade, an adherent to Visual SlickEdit (popular within Microsoft at a time, allegedly Dave Cutler‘s favourite). I purchased (as a Microsoft Alumnus) almost all the editions of Visual Studio over the years. I tried SharpDevelop a couple of times but I found it immature and I’ve never used MonoDevelop.

All the editors and IDEs that I used have their merits (including IntelliJ and Eclipse, competitors to NetBeans), but NetBeans makes me tick because:
  • It is free. This is slightly better than IntelliJ. OBS: I don’t imply that paying for IntelliJ is a waste of money; I just find it more compelling to program in a free IDE on a free platform like Java.
  • It is simple. It does everything I need without unnecessary complexity.
  • It is standard. This is a very important reason for me. The fact that it had strong support for latest Java out of the box made me stick with it.
  • It has unparalleled support for Maven (very important to me) and Java Enterprise (less important for the moment, but this may change).
  • It is a platform in itself and this is related to the content of the conference I’ve just attended.
No wonder that I was quite expecting the very first NetBeans Day conference in London.

NetBeans Day

It was a whole-day conference, kindly hosted on the University of Greenwhich‘ campus. Food provided (nothing gourmet, to be sure, according to the well established tradition of geeky events like OJC, for example). Talks of unequal level mainly because some talks were not very related to NetBeans, or NetBeans’ extensibility - arguably people’s most important reason to attend.

However, three highlights make me warmly recommend the event:
  • opening talk by Geertjan Wielenga which showed what NetBeans can do for you – nice even though I am not a fan of shortcuts and editing tricks (after years of assembly in kdbg1), I want to rejoyce that mice exist)
  • talk on evolution from BlueJ and Greenfoot to NetBeans with emphasis on education – important for a father with children like me
  • closing talk by Geertjan Wielenga on how to program solutions on NetBeans platform – important for a “lazy” GUI developer like me
The last point hit a nerve: although all the IDEs I know exhibit extensibility in one fashion or another, it seems to me that NetBeans has a more general type of extensibility - which is not confined to adding new programming languages or programming productivity features. It looks as if in NetBeans you can program anything - Geertjan even showed us an application in farming!

Conclusion

Although not a perfect event (understandable for a first edition), the NetBeans Day in London is definitely not to miss. I am looking forward to the next edition which, according to the organizers, may happen later this year.

1) The Windows Kernel Debugger, part of the DDK. Not to be confused with KDE GNU Debugger.

Saturday, 16 May 2015

Design Pattern: fast concurrency with detached state

The basic equation of multi-threading can be summarised as follows:
Concurrency + Mutability = Performance Penalty + Error Proneness
which is a direct result of context switches and the explicit management of state consistency in the presence of concurrency.
In this article I’ll show a design pattern where the equation changes into:
Concurrency + Mutability = Efficiency + Simplicity
Disclaimer: this pattern almost surely exists somewhere else. Like many good things in life, it can bear the burden of many parents. I mention it here because it is related to my previous article.

Example

Let’s consider a class named Thing with properties volume and mass and another property named density derived from the previous two. Because it’s very hard to calculate density from mass and volume (tongue-in-cheek), its value needs to be cached so that we don’t repeat the complex calculation density = mass / volume every single time.
In other words, the property values are correlated and any object of type Thing must be consistent. The “canonical” way to maintain consistency in presence of concurrency is by enclosing state changes in synchronized blocks:

Tuesday, 12 May 2015

Caching enumerations: the internals


In the previous post I wrote about the correct implementation for caching enumerables over large sets that involve a large number of compositions.
In this article I will give implementation details.

Two ways of being lazy

Lazy evaluation is in the bone and marrow of the EnumJ library. Caching enumeration is no exception: the internal caching buffer grows lazily, as the consuming enumerators require.
EnumJ has two classes that help lazy evaluation:

Lazy<T>

Lazy<T> is an implementation of LazyInitializer<T> that takes a supplier and overrides initialize() to call the supplier when initializing the object. The value is initialized only once even under multi-threading and it is fast once initialization is carried out.
In addition, Lazy<T> ensures that the supplier gets released upon initialization. The code is quite simple:

Saturday, 9 May 2015

Caching enumerations


Enumerators are particularly well suited to specify sets of elements via composing constraints, generators or any combination thereof. Enumerators have a declarative flavour that almost matches Mathematics. For example, a set definition like this one:
A = { x in N | 0 <= x <= 10 and x is even }
can nicely be expressed as:
Enumerator<E> e = Enumerator.rangeInt(0, 11).filter(x –> 0 == x % 2);
or, in Enumerable parlance:
Enumerable<E> e = Enumerable.rangeInt(0, 11).filter(x –> 0 == x % 2);
the difference being that an Enumerable<E> can participate to for(each) statements and can be enumerated upon more than once.

The problem

All is great for a few compositions, but enumerators are designed to support large numbers of compositions, in the range of hundreds of thousands or while memory lasts. The sheer existence of that many operations imply that a single original element undergoes hundreds of thousands of transformations before coming out as an enumerated result.
There is nothing we can do: if the operations are really needed then they must be carried out, period.
The problem is when we have to enumerate repeatedly over a sequence that doesn’t change: if the transformations have no side effects, then re-computing the same result means that hundreds of thousands of transformations are being applied uselessly over and over. Such a waste!
The solution comes in the form of the age-old method of caching.

The solution

Caching, when done carefully, saves a huge deal of time by:
  • storing locally what is remote
  • storing in memory what is on slow media
  • storing for fast retrieval what is computed too slowly
Our case falls in the third category. Before showing implementation details, let us first overview a couple of sub-optimal solutions.

Saturday, 2 May 2015

Shareability

The idleness of the past holiday weeks brought to me a much needed insight: how important scalable shareability is for Enumerable<T> (if such an interface were to be of any use for high-compositionality scenarios).

What is shareability?

It is the ability to be shared as a separate “image”, while each “shared” image retains the properties of the original. In the case of Enumerable<T> this means participating to multiple operational compositions and still doing the expected job efficiently in terms of computation as well as memory and stack consumption.
Here is a simple example in which an enumerable named e is shared between two independent compositions:

Enumerable<T> e = ... some code ... Enumerable<U> uE = e.map(t2uMapper); Enumerable<V> vE = e.map(t2vMapper);
for(U u: uE) {
 
System.out.println("My Us: " + u + " ..."); } for(V v: vE) {
 
System.out.println("My Vs: " + v + " ..."); }
The goal is to ensure that both loops work properly and, more importantly, the whole thing is scalable: if we have a chain of hundreds of thousands of operations peppered with thousands of points where enumerables are being shared between independent pipelines, all the resulting enumerables (which may come and go at runtime and number in the range of millions or more) still work properly, efficiently and do not overflow the stack (assuming we have enough heap).

What is the support for it?

Before discussing how to implement this in the EnumJ library, let us see what support we have out there for such a thing.

Saturday, 28 March 2015

Enumerables

The EnumJ library needs one small addition before going public: an Enumerable<E> interface.

Enumerator<E> versus Enumerable<E>

As I mentioned in a previous post, Enumerator<E> is like Iterator<E> but endowed with many methods that make enumerators look a lot like Java 8 streams. In addition and unlike streams, enumerators exhibit high composability, shareability and fault tolerance.
All is great, but with one caveat: they don’t repeat (streams do the same). Once you finish enumerating, there is no way to reuse the enumerator, it’s dead forever.
True, it is possible to re-generate the enumerator from the same source and it is also possible to rebuild the pipeline, but what if both the source and the pipeline are unknown? What if all we get is a single Enumerator<E> object>? In such case, it is not possible to re-enumerate the same sequence again.
Enumerable<E> solves this conundrum. This interface extends Iterable<E> and defines the iterator() method in terms of a new method called enumerator():
public interface Enumerable<E> extends Iterable<E> {

   
@Override
   
public default Iterator<E> iterator() {
       
return enumerator();
   
} 

   
public Enumerator<E> enumerator(); }

Implementation details

In addition to replacing iterator() with enumerator(), the Enumerable<E> interface will feature the same composability that Enumerator<E> enjoys: map(), filter() and the like will be part of its portfolio of methods and it will have the same degree of composablity as Enumerator<E>. This means that, for example, applying one hundred thousand concatenations will not overflow the stack when the iterator is being returned.

Saturday, 21 March 2015

Late binding

I’ve just implemented support for late binding in the EnumJ library. This paves the way for enumerator serialization which, in turn, will make possible remote and/or distributed enumerations.

The problem

Usually, when we enumerate, we know what we enumerate upon. This is materialized in the EnumJ library by the Enumerator.of() kind of methods. They receive an iterator, iterable, enumeration, stream or spliterator as parameter.

But what if, at the time of pipeline construction, we don’t have the source ready? In this case we have the option to provide the source of enumeration lazily and the methods Enumerator.ofLazyIterator(), Enumerator.ofLazyIterable(), … etc., give us some form of late-binding.

It gets more complicated: what if the source of information is completely unknown when we build up the pipeline? Lazy inputs don’t help because the input supplier must know what iterator to supply lazily.

Here’s where late binding comes into play.

The solution

The EnumJ library now exposes a LateBindingEnumerator<E> class which implements Enumerator<E> and has two more methods:

Sunday, 15 March 2015

Fourteen minutes

This past week I’ve given two small talks on the subject of Enumerators in Java 8 at two meetups organised by the London Java Community. Each speech was meant to be about seven minutes, I largely satisfied that limit.

You can watch the video on the first talk here (a free SkillsMatter account may be required). The slides of the second talk exist here.

Saturday, 7 March 2015

EnumJ Doc

I wrote a previous post about my library for enumerators. I explained that enumerators are highly compose-able iterators exhibiting properties that Java 8 streams currently do not have.

In a later post I showed how to use enumerators to generate large amounts of test data, with such a degree of variety that we can come close to guaranteeing our software for a given period of time.

Today I’ve made a small leap forward: I’ve finished the first draft of EnumJ javadocs which I’ve placed on BitBucket. I do not disclose the source code yet; because of late changes, I am not 100% code coverage anymore.

But the library works and it works rather well. It will find its place quite soon on GitHub (like any proper open source project) and its binary will (hopefully) find a way to Maven Central.

Till then, enjoy the EnumJ documentation.

Sunday, 22 February 2015

Software warranty with Java 8

Can we give assurances that a piece of software will run virtually flawlessly for a period of time (say, one year)? Can we give timed warranties in software products, just as other engineering industries have been doing since the dawn of the industrial revolution?

The question seems absurd because, unlike physical parts, software components don’t change with time. Yet, software performance does degrade with time (because its execution environment changes) and more often than not it is exactly time that makes difficult and nasty bugs surface, often with bad consequences.

In absolute sense, it is impossible to guarantee software. Software is way too complex and non-standard to sustain rigorous and legally-binding warranties, let alone that bug-free software is not only a practical impossibility but also a mathematical absurdity1).

Yet, given the right testing methodologies, it is possible to say with enough confidence that a certain software component will run virtually flawlessly for a period of time, under specified conditions. The bugs will be there, some of them will even manifest during the warranty period, but they will be few and far between and their effects will be under control.

I’ll show in this article how to do it, via a simple example. Disclaimer: the illustrative code below is in Java 8. Nevertheless, these principles apply to any language that is capable of generating values lazily - which extends the range to pretty much everything (given the right effort).

Tuesday, 17 February 2015

Rejoice with GS

I always rejoice when I see things well done.

Through the London Java Community channel I’ve learned about the Goldman Sachs Collections library, a state of the art piece of software in active development since 2005.

Goldman Sachs Collection library has reached its 6.0.0 version and it’s been brought in line with Java 8.

The maturity and the level of conceptual consistency this library exhibits is really impressive. Along with Apache Commons, it will be yet another point of reusability that I trust.

If you use collections heavily (and who doesn’t ?), this library is not to miss.

Saturday, 7 February 2015

Enumerators in Java 8: when streams aren’t enough

I decided to write a small library for enumerators in Java 8. What I call an enumerator is, in fact, a special kind of iterator endowed with certain properties such as compose-ability - as Java streams and .NET LINQ operators already have.

I think of enumerators as a specialized type of generators which, in turn, are a watered down version of coroutines.

The intent is obvious: with enumerators, generating potentially infinite sequences of potentially infinite data structures of a potentially unlimited set of topologies defined entirely at runtime becomes a reality. This is a luxury that programmers in high-level functional or logic languages have been enjoying for decades.

Aren’t Java 8 streams enough?

No, they aren’t. Streams are a great addition to Java brought over by Java 8 and they blend wonderfully with Java 8 lambda expressions. Moreover, they have strong parallel support which makes them ideal for exploiting multi-core architectures with minimal effort.

However, there are scenarios where they are either too strong or insufficient:

  • They have been designed with parallelism in mind. While this is a good thing in general, parallelism support makes them quite heavy for very simple tasks where parallelism is not needed.
  • Their parallel nature makes them lack certain operations which are quintessentially sequential:
    • Stream<T>.takeWhile(Predicate<T>) which takes elements while a predicate evaluates to true.

      OBS: we may also call this Stream<T>.limitWhile(Predicate<T>) if we are to stay in line with Stream<T>.limit(int)

    • Stream<T>.zip(Stream<T>) which binds together two streams and produces a stream of pairs
    • Stream<T>.map(Function2<Integer,T,R>) which models Select<T>(this IEnumerable<T>, Func<int,T,R>) from .NET LINQ

  • They lack pipelining which makes them unsuitable for compositions in large number.

    In the case of Stream.concat(Stream<T>, Stream<T>) for example, the JDK documentation warns the user against repeated concatenation as it may lead to large stacks or even stack overflows. This is caused by lack of pipelining, this lack being a result of the parallel nature of streams. I’ll detail on pipelining in a section below.

  • They are not lazy in their arguments. For example, we have Stream.concat(Stream<T>, Stream<T>) but there is no Stream.concat(Supplier<Stream<T>>, Supplier<Stream<T>>). As I explained in a previous post, being supper lazy is sometimes necessary. Java 8 streams, as they are implemented today, are lazy in yielded values but not lazy in input parameters.
  • They are rather hard to customize. Because of their parallel nature, they don’t rely on iterators (which are strictly sequential) but on spliterators (the name is a contraction of split iterators). Spliterators offer support for splitting the iterated sequence to allow parallelism, but customizing them is not the easiest thing to do.

Saturday, 17 January 2015

Putting functions to good use

In my previous post I wrote that Java 8 has a new, functional face – and a pretty one. Programming in a functional style is not a theoretical whim of computer scientists, is has immediate practical consequences. Thinking in terms of function composition produces shorter code that is easier to read and, more importantly, more robust.

In this article I will show a few functional little tricks that yield significant benefits.

Efficient assertions

Assertions are a powerful way to create self-verifying code. A programmer who routinely uses assertions writes better code simply because he has to think twice about the invariants of a program.

However, assertions do incur a performance penalty if the asserted expression is complex. This can be addressed in Java 8 by using lazy evaluation via an anonymous function returning a boolean:

Assert.must(() -> 3 == myComplexFunctionThatHasToReturnThree(...));

instead of using the classic assert keyword:

assert 3 == myComplexFunctionThatHasToReturnThree(...);

To do this trick we need a class Assert that contains a method must() which does all the work. Firstly, Assert should be able to detect that assertions are turned on:

public static final boolean ON = true;
private static boolean isEnabled = false;

static {
   
assert isEnabled = true; // side effects intended
}

public static boolean enabled() { return isEnabled; }

The ON flag allows the programmer to remove the asserting code anywhere in his program by leveraging the optimizing capabilities of the Java compiler (the so-called conditional compilation idiom in Java). The isEnabled internal flag is set to true whenever assertions are globally turned on. The enabled() method allows read-only access to isEnabled from anywhere in the program.

The code for the must() method is deceptively trivial:


public static void must(Supplier<Boolean> assertion) {
   
if (ON) {
       
if (enabled()) {
           
assert assertion.get();
       
}
   
}
}

The ON flag eliminates the method altogether when false and the call to enabled() makes sure the evaluation of assertion takes placed only when assertions are turned on. The Appendix contains the full code for the Assert class.

Saturday, 10 January 2015

Being super lazy is (sometimes) good

Iterators are great. Although many times they are derived from a collection with elements known upfront, they can be used to model a potentially infinite collection and, with the help of additional operations, they offer the possibility to work with collections almost declaratively.

Clarifications: what I mean by iterator is not just an instance of a Java interface. It’s more of an abstraction with various incarnations: IEnumerable<T> in C#, iterators in Python, streams in Java 8 and Scala or lazy collections in Haskell.

Iterators have an important quality: they are lazy. They do work only when necessary and they return the results one by one, when asked for. Let’s see an example.

Note: the examples here are in C#. The Java 8 equivalent is summarized at the end (without full implementation).

Iterating over binary trees

Let’s consider a simple Tree data structure:

public class Tree<T>
  {
     
public readonly T Data;
     
public readonly Tree<T> Left;
     
public readonly Tree<T> Right;

     
public Tree(T data, Tree<T> left, Tree<T> right)
     
{
         
this.Data = data;
         
this.Left = left;
         
this.Right = right;
     
}
  }

It is trivial to traverse a tree recursively, it means to traverse the left child, then do work with the data, then traverse the right child:

public static void TraverseInOrder<T>(Tree<T> tree)
  {
     
if (tree != null)
     
{
       
TraverseInOrder(tree.Left);
       
Console.WriteLine(tree.Data);
       
TraverseInOrder(tree.Right);
     
}
  }

There are two problems with this code:

  • The work being done upon each node of the tree (in this case a Console.WriteLine operation) cannot be changed, it is imbedded in the traversing code
  • There is no way to interrupt execution in the middle, to stop on a certain condition or when a certain number of nodes have been processed

The first problem can be resolved in various ways:

  • Replace the call to Console.WriteLine() with a call to an abstract method of the Tree<T> class
  • Externalize the operation and transform the algorithm according to the Visitor pattern
  • Pass the operation as a parameter to the traversing method. In C# that would be an Action<T> instance, in Java 8 the Consumer<T> interface would fit the bill.

The second problem requires a more involved solution and this is where iterators come to the rescue.

Sunday, 4 January 2015

The Inexistent Resolutions

I have never made resolutions. Ever.

I have never made a list of things I plan to do by the next 31st of December. I certainly have a list of things I have not done since the last 31st of December and sad to say, this list grows by the year, its tail gets lost along the way while the tasks in it become irrelevant with time.

But, hey, there is also a smaller (albeit no less important) list of things I have done. So, I will focus on this list – it’s shorter and easier to write about Smile.

Visiting Java

I write “visiting Java” as if it were a place (and it is, in Indonesia) but obviously I am talking about the Java eco-system and – more recently discovered – the community that stands behind it. Yes, I’ve revived my Java blood-stream over the past year, to my great pleasure.

Event #1 : Meet-A-Mentor

Well, this is not really a Java event, but it is organized by the special folks at RecWorks, one of the most interesting recruiting companies on the banks of the Thames, folks who also organize London Java Community. Clear, right?

I’ve taken part to one of the sessions and I am anticipating one in January. Great, uplifting event. Always refreshing to have contact with young, bright minds and a pleasure to liaise a little bit with fellow mentors.

One interesting detail: it seems there is a lot of emphasis on data science out there in the campus land, I think half of the students I’ve talked to were expecting jobs as data scientists. This was quite surprising to me. The interest was so keen that one student asked us “which tool is better for X type of data analysis, Y or Z?”. How would you answer that?

Event #2 : Open Java Conference

The past November, OJC in London at the IBM Centre, close to London Eye and the world famous Tower Bridge. Awesome event - not free, but with a very affordable fee. A wide variety of presentations, from the easy and light (“how to fix Enterprise Software?”1)) to the more hard-core (IBM Watson architecture, optimizing Java by sniffing into the workings of the Hotspot compiler, etc).

Food included in price and mediocre, as it is expected at such a great geeky event. A little bit too much socialising for my taste, the long breaks (for obligatory networking) could have allowed for at least one more round of sessions. But people nice and friendly, knowledgeable of what they were talking about and the event very well organized and definitely something not to miss. Ironically, in 2013 I had purchased tickets for this conference but I didn’t go. I won’t repeat this mistake.

Not to forget, during OJC London 2014 it was quite an emphasis on the Adopt OpenJDK initiative which originated in London (or LJC is an active part of it). This leads to the 3rd event …