Caliper confirms reality: LinkedList vs ArrayList

I've been exercising Caliper, our new JUnit-like microbenchmarking framework. I'm using it to explore and to confirm my assumptions about the JVM and its libraries.

At what size is LinkedList faster than ArrayList for queue-like access?

What I want to measure is first-in-first-out: add elements at the end, and remove 'em from the front. First I recall how these data structures work...
  • ArrayList limits allocations. It's backed by a contiguous array so add-at-end is generally cheap: just assign a value to the nth cell. But remove-at-front is expensive: move the array's entire contents over by one cell.
  • LinkedList limits copies. It's backed by a sequence of nodes so add-at-end requires an allocation immediately and a deallocation later. But remove-at-front is cheap: change a few pointers to assign a new head node.

I'm assuming that ArrayList is faster for empty lists because neither a move nor an allocation is required. I'll also assume that ArrayList is slower for large lists since that'll require a large number of elements to be moved.

I ask Caliper to answer my question in the form of a benchmark:
public final class ListAsQueueBenchmark extends SimpleBenchmark {

  @Param({"0", "10", "100", "1000"}) int size;

  private final ArrayList<String> arrayList = new ArrayList<String>();
  private final LinkedList<String> linkedList = new LinkedList<String>();
  private final Deque<String> arrayDeque = new ArrayDeque<String>();

  @Override protected void setUp() throws Exception {
    for (int i = 0; i < size; i++) {
      arrayList.add("A");
      linkedList.add("A");
      arrayDeque.add("A");
    }
  }

  public void timeArrayList(int reps) {
    for (int i = 0; i < reps; i++) {
      arrayList.add("A");
      arrayList.remove(0);
    }
  }

  public void timeLinkedList(int reps) {
    for (int i = 0; i < reps; i++) {
      linkedList.add("A");
      linkedList.remove(0);
    }
  }

  public void timeArrayDeque(int reps) {
    for (int i = 0; i < reps; i++) {
      arrayDeque.addLast("A");
      arrayDeque.removeFirst();
    }
  }
}
I ran the benchmark and the results surprised me: HotSpot's allocator and concurrent GC are fast! It takes only 10 elements for the LinkedList to pay for its extra allocations. And at 1000 elements, LinkedList is 20x faster than ArrayList for queue-like access.

Of course, ArrayDeque is really the champion in this case. It performs neither allocations nor moves and beats both ArrayList and LinkedList for all inputs.


Kevin Bourrillion and I will be presenting Caliper at JavaOne 2010 on Thursday, September 23.

6 comments:

Nabeel Ali Memon said...

Looking at a nice micro-benchmarking framework for Java is cool. But don't you think it could've been much cooler if you had have designed it to be annotation-based rather than junit-3 like conventions-based?

JodaStephen said...

I'm intrigued by the result. I benchmarked these two and Apache Commons TreeList years ago and got a very different answer. Details in the javadoc http://commons.apache.org/collections/apidocs/org/apache/commons/collections/list/TreeList.html . Perhaps the JDK has just improved a lot? Perhaps you could test TreeList now?!

swankjesse said...

TreeList performs quite poorly when used as a queue:
http://microbenchmarks.appspot.com/run/limpbizkit@gmail.com/com.publicobject.blog.TreeListBenchmark

Noel Grandin said...

These kind of benchmarks can be deceiving.

GC allocation/free is a delayed cost that you won't really notice until you're trying to get the most out a machine.

Rob said...

I'm liking the look of Caliper, Jesse! That's definitely something I'll use in Trove and other projects.

As for list performance... yeah, what Noel said. It depends what people mean when they say "performance". But, yes, ArrayList as a queue is a bad idea. 'Course, now you should test iteration. ;-)

kevinb said...

I believe Noel is right, and so one should never really compare "time A" vs. "time B", but rather "time A, bytes allocated A" to "time B, bytes allocated B". There is not going to be any one simple formula for producing an aggregate value of the two that will work for everyone; you just have to consider both.

And because this is necessary anyway, I believe the framework should take steps to exclude GC activity from its timing measurements (for example by using non-concurrent GC and setting parameters such that only some timing intervals get interrupted by GC, and excluding those intervals from the average). It's been my plan to do this for caliper ... for a while :(