I really like the Java Collections API. So much so, that I use 'em when I'm doing work that isn't particularly collectioney. For example, I recently wrote a quick-n-dirty app that rewrote some files line-by-line. Instead of using a
Reader as input, I used an Iterator<String>. The easiest way to create such an iterator is to load the entire file into memory first.Before:
public Iterator<String> linesIterator(Reader reader) {
BufferedReader buffered = new BufferedReader(reader);
List<String> lines = new ArrayList<String>();
try {
for (String line; (line = buffered.readLine()) != null; ) {
lines.add(line);
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return lines.iterator();
}That code is simple, but inefficient. And it won't work if the file doesn't fit into memory. A better approach is to implement Iterator and to read through the file on-demand as the lines are requested. Google Collections ' AbstractIterator makes this easy. Whenever a new line is requested, it gets called back to read it from the stream.After:
public Iterator<String> linesIterator(Reader reader) {
final BufferedReader buffered = new BufferedReader(reader);
return new AbstractIterator<String>() {
protected String computeNext() {
try {
String line = buffered.readLine();
return line != null ? line : endOfData();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
};
}This class is really takes the fuss out of custom iterators. Now it's not difficult to create iterators that compute a series, process a data stream, or even compose other iterators.

5 comments:
The only problem I have with this code is that it converts any typed exception thrown to an untyped one. It makes you handle it explicitly in the code that uses this iterator. I think this can lead to more coding errors.
But then, as they saying goes, with power comes responsibility... :)
This is very cool, I love this series.
One thing that is a bit cumbersome for me is that AbstractIterator does not implement Iterable. So I often find myself having to manually add implements Iterable<String> to the signature, as well as implementing the iterator() method (returning this).
I do this when I am doing something like
for(String line : new LinesIterator(in)) {
//process line...
}
Maybe it is a bad practice in general, to make an iterator iterable. Any better solutions for this situation?
Cool stuff, again. Congratulations.
During the (great) Google Collections' talk, Kevin talks about reading a stream from Bigtable through Iterables. I wondered if he was just exemplifying a possible use of Iterables or if you really have such kind of java interface to Bigtable.
I know that you prefaced this with "quick-n-dirty", but in the long run, wouldn't this approach leak open files? (Ditto for any iterable source that needs to be bracketed by calls to open/close.)
If your source needs to be closed after iteration, any thoughts about use cases that don't finish iterating, like...
String firstLine = linesIterator(fileReader).next();
...and how to plug those leaks?
mk, yeah it's definitely not what you want in a long-running application. Finalization might be a reasonable option here - when the iterator goes out-of-scope, make sure the file is closed.
Post a Comment