Coding in the small with Google Collections: AbstractIterator

Part 17 in a Series.

I really like the Java Collections API. So much so, that I use 'em when I'm doing work that isn't particularly collectioney. For example, I recently wrote a quick-n-dirty app that rewrote some files line-by-line. Instead of using a Reader as input, I used an Iterator<String>. The easiest way to create such an iterator is to load the entire file into memory first.

Before:

  public Iterator<String> linesIterator(Reader reader) {
BufferedReader buffered = new BufferedReader(reader);
List<String> lines = new ArrayList<String>();

try {
for (String line; (line = buffered.readLine()) != null; ) {
lines.add(line);
}
} catch (IOException e) {
throw new RuntimeException(e);
}

return lines.iterator();
}
That code is simple, but inefficient. And it won't work if the file doesn't fit into memory. A better approach is to implement Iterator and to read through the file on-demand as the lines are requested. Google Collections ' AbstractIterator makes this easy. Whenever a new line is requested, it gets called back to read it from the stream.

After:

  public Iterator<String> linesIterator(Reader reader) {
final BufferedReader buffered = new BufferedReader(reader);

return new AbstractIterator<String>() {
protected String computeNext() {
try {
String line = buffered.readLine();
return line != null ? line : endOfData();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
};
}
This class is really takes the fuss out of custom iterators. Now it's not difficult to create iterators that compute a series, process a data stream, or even compose other iterators.

5 comments:

Madhat said...

The only problem I have with this code is that it converts any typed exception thrown to an untyped one. It makes you handle it explicitly in the code that uses this iterator. I think this can lead to more coding errors.
But then, as they saying goes, with power comes responsibility... :)

Sam Beran said...

This is very cool, I love this series.

One thing that is a bit cumbersome for me is that AbstractIterator does not implement Iterable. So I often find myself having to manually add implements Iterable<String> to the signature, as well as implementing the iterator() method (returning this).

I do this when I am doing something like

for(String line : new LinesIterator(in)) {
//process line...
}

Maybe it is a bad practice in general, to make an iterator iterable. Any better solutions for this situation?

edward said...

Cool stuff, again. Congratulations.

During the (great) Google Collections' talk, Kevin talks about reading a stream from Bigtable through Iterables. I wondered if he was just exemplifying a possible use of Iterables or if you really have such kind of java interface to Bigtable.

mk said...

I know that you prefaced this with "quick-n-dirty", but in the long run, wouldn't this approach leak open files? (Ditto for any iterable source that needs to be bracketed by calls to open/close.)

If your source needs to be closed after iteration, any thoughts about use cases that don't finish iterating, like...

String firstLine = linesIterator(fileReader).next();

...and how to plug those leaks?

swankjesse said...

mk, yeah it's definitely not what you want in a long-running application. Finalization might be a reasonable option here - when the iterator goes out-of-scope, make sure the file is closed.