Iterators are great. Although many times they are derived from a collection with elements known upfront, they can be used to model a potentially infinite collection and, with the help of additional operations, they offer the possibility to work with collections almost declaratively.
Clarifications: what I mean by iterator is not just an instance of a Java interface. It’s more of an abstraction with various incarnations: IEnumerable<T> in C#, iterators in Python, streams in Java 8 and Scala or lazy collections in Haskell.
Iterators have an important quality: they are lazy. They do work only when necessary and they return the results one by one, when asked for. Let’s see an example.
Note: the examples here are in C#. The Java 8 equivalent is summarized at the end (without full implementation).
Iterating over binary trees
Let’s consider a simple Tree data structure:
{
public readonly T Data;
public readonly Tree<T> Left;
public readonly Tree<T> Right;
public Tree(T data, Tree<T> left, Tree<T> right)
{
this.Data = data;
this.Left = left;
this.Right = right;
}
}
It is trivial to traverse a tree recursively, it means to traverse the left child, then do work with the data, then traverse the right child:
public static void TraverseInOrder<T>(Tree<T> tree)
{
if (tree != null)
{
TraverseInOrder(tree.Left);
Console.WriteLine(tree.Data);
TraverseInOrder(tree.Right);
}
}
There are two problems with this code:
- The work being done upon each node of the tree (in this case a Console.WriteLine operation) cannot be changed, it is imbedded in the traversing code
- There is no way to interrupt execution in the middle, to stop on a certain condition or when a certain number of nodes have been processed
The first problem can be resolved in various ways:
- Replace the call to Console.WriteLine() with a call to an abstract method of the Tree<T> class
- Externalize the operation and transform the algorithm according to the Visitor pattern
- Pass the operation as a parameter to the traversing method. In C# that would be an Action<T> instance, in Java 8 the Consumer<T> interface would fit the bill.
The second problem requires a more involved solution and this is where iterators come to the rescue.
To Linq or not to Linq, that is the question
Let’s consider a Nodes property to the Tree<T> class that returns the iterator which yields the nodes of the tree:
{
get { throw new NotImplementedException(); }
}
Before implementing the body of the property, we must say that the implementation is best done recursively: to iterate over the elements of the tree means to iterate over the elements of the left child followed by “iterating” over the element and then to iterate over the elements of the right child.
There are two ways to do it. The first way is to simply iterate explicitly within the body of the property and use the yield keyword to provide the element at each step:
{
get
{
if (this.Left != null)
{
foreach(var elem in this.Left.Nodes) { yield return elem; }
}
yield return this.Data;
if (this.Right != null)
{
foreach(var elem in this.Right.Nodes) { yield return elem; }
}
}
}
Oh, boy, this is ugly! Not only it’s long but it misses the point since it iterates robotically over the elements of the children tree as if iterators are for … iterating! That’s an under-use of iterators since iterators are (or should be) abstract representations of collections and iterating should be postponed as much as possible.
So, we are not going to iterate explicitly over the elements of the children. We are going to use the power of Linq combined with the ubiquitous ?: operator:
{
get
{
return (this.Left != null ? this.Left.Nodes : Enumerable.Empty<T>())
.Concat(this.Right != null ? this.Right.Nodes : Enumerable.Empty<T>());
}
}
Much nicer, isn't it? By using the Concat operator from Linq, we concatenate three iterators: the one yielding the elements of the left child (empty on null), an iterator yielding the current node (that Enumerable.Repeat(this.Data, 1) expression) and finally the iterator for the right child.
As elegant as this solution is, it has a major flaw: it is not lazy. Why? Because even though the elements are returned lazily, the iterators are produced eagerly since the Concat() operator needs all its participants ready before working. In other words, for each node all its sub-node iterators must be ready before the current iterator may be constructed.
Many times this is not a problem but here it is: given that for each node we have a piece of data and an iterator, producing iterators eagerly is just as bad as generating data eagerly.
What can we do? Do we have to go back to the ugly, imperative approach shown right under Mark Henry? No way! Because we have … promises.
Keep your promises (when it makes sense)
A promise is a concept widely used in reactive programming and, to some extent, in functional programming as well. A promise is nothing else than a value that is … promised to be delivered in the future but which can be worked upon now.
In reactive programming, the caller gets a promise when he issues an asynchronous operation. Even though the promise will be honoured in the future it can be worked upon now with all the changes being triggered when the value is available. There is some extra terminology regarding promises and futures, but this is the basic idea.
In functional programming, a parameter-less function returning a value can be thought of as a promise. As functions are first-class objects in FP, they can be worked upon producing new functions. That is, new promises.
What about iterators? An iterator is, in itself, a promise since it promises to deliver the elements when activated. We’ll use the power of promises (that is, iterators) to solve the problem of eagerly producing iterators beforehand.
To do this, we add another property to the Tree<T> class named PromisedNodes and we … promise that this property will not create iterators eagerly. Before implementing it we need a way to find out what promises a node can make. We can do that by adding an aptly named NodePromises property to the Tree<T> class:
{
get
{
if (this.Left != null) { yield return this.Left.PromisedNodes; }
yield return Enumerable.Repeat(this.Data, 1);
if (this.Right != null) { yield return this.Right.PromisedNodes; }
}
}
This property tells us that the node makes tree promises: to yield the nodes of the left, to yield the current element and to yield the nodes on the right. With that in hand, we can implement the PromisedNodes property in one line:
{
get { return NodePromises.SelectMany(it => it); }
}
Why is this lazy and Nodes is not? Because, unlike Concat(), the SelectMany() operator is lazy in both terms: elements and participating iterators. In other words, it is a super-lazy operator and we exploit that to our advantage.
A little bit of complexity
Let’s do a little bit of complexity calculus. If the tree has a height H and is complete (each node has both children or none) then we have 2H nodes. If we visit the nodes without iterators, we get an execution of exponential complexity O(2H), without the possibility of stopping early.
If we visit the nodes with iterators but without promises, then we can stop early but we have 2H iterators produced upfront so the complexity is still O(2H), as bad as the non-iterated solution.
If we visit the nodes with promises, then the number of promises active at any point in time equals the length of the path between the root and the current node. As this path cannot be longer than H, then we obtain a linear complexity of O(H) which is a huge gain.
Proving it works
The C# program shown in Appendix below clearly shows that iterators are created eagerly whereas promises get created lazily:
2
1
4
5
Iterator for 1!
Iterator for 2!
Iterator for 3!
Iterator for 4!
Iterator for 5!
3
2
1
4
5
Promise for 1!
Promise for 2!
Promise for 3!
3
2
1
Promise for 4!
4
Promise for 5!
5
What about Java?
Java 8 lacks the yield keyword but, incidentally, it is possible to write the getPromisedNodes() property in one shot:
return IntStream.of(1, 2, 3)
.mapToObj(i -> {
switch (i) {
case 1: return this.left.map(t -> t.getPromisedNodes())
.orElse(Stream.empty());
case 2: return Stream.of(this.data);
case 3: return this.right.map(t -> t.getPromisedNodes())
.orElse(Stream.empty());
default: assert false; return Stream.empty();
}
.flatMap(it -> it);
}
Conclusion
This example shows that when using iterators as a way to process information lazily we should pay careful attention whether the iterators themselves can (or should) be processed lazily, too. In other words, we shouldn’t be just lazy. We should be super lazy.
Appendix
This C# source code compiles and runs properly on Microsoft Visual Studio Community 2013.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace PromisedIterators
{
public class Tree<T>
{
public readonly T Data;
public readonly Tree<T> Left;
public readonly Tree<T> Right;
public Tree(T data, Tree<T> left, Tree<T> right)
{
this.Data = data;
this.Left = left;
this.Right = right;
}
public IEnumerable<T> Nodes
{
get
{
Console.WriteLine("Iterator for " + this.Data + "!");
return (this.Left != null ?
this.Left.Nodes :
Enumerable.Empty<T>()).Concat(Enumerable.Repeat(this.Data, 1))
.Concat(this.Right != null ?
this.Right.Nodes :
Enumerable.Empty<T>());
}
}
public IEnumerable<T> PromisedNodes
{
get
{
Console.WriteLine("Promise for " + this.Data + "!");
return NodePromises.SelectMany(it => it);
}
}
private IEnumerable<IEnumerable<T>> NodePromises
{
get
{
if (this.Left != null) { yield return this.Left.PromisedNodes; }
yield return Enumerable.Repeat(this.Data, 1);
if (this.Right != null) { yield return this.Right.PromisedNodes; }
}
}
}
public static class Program
{
public static void Main(string[] args)
{
var t = Build12345Tree();
TraverseInOrder(t);
foreach(var elem in t.Nodes)
{
Console.WriteLine(elem);
}
foreach(var elem in t.PromisedNodes)
{
Console.WriteLine(elem);
}
}
public static void TraverseInOrder<T>(Tree<T> tree)
{
if (tree != null)
{
TraverseInOrder(tree.Left);
Console.WriteLine(tree.Data);
TraverseInOrder(tree.Right);
}
}
public static Tree<int> Build12345Tree()
{
return new Tree<int>(1,
new Tree<int>(2,
new Tree<int>(3, null, null),
null),
new Tree<int>(4,
null,
new Tree<int>(5, null, null)));
}
}
}
No comments:
Post a Comment