This page is intended for recording steps we have taken over time to improve RAMCloud performance, along with measurements of the resulting performance gains. Add new entries at the beginning of the page, so that the entries are in reverse chronological order.
Buffer rewrite (June 2014, John Ousterhout)
Rewrote Buffer.cc and Buffer.h from scratch to streamline and simplify, in the hopes of speeding up basic operations.
- One overall approach was to eliminate layers within the Buffer class. For example, allocating a new chunk used to have to pass through many levels of method call, with many of the methods doing nothing except passing their arguments to the next method in the chain. In the new version the most common operations are completely imploded in a single method.
- The Buffer::Iterator class was simplified by moving almost all the computation to the
next
method and handling special cases related to the first chunk in the constructor. In the old version, there was significant complexity in each ofnext
,getData
, andgetLength
, with significant duplication, and extra code to deal with the first chunk that had to be executed for every single chunk. In the new version,getData
andgetLength
are in-line methods that do nothing except return precomputed values.
Performance comparisons:
Old New create, append 1 chunk, delete 17.0ns 12.3ns create, alloc 1 chunk, delete 22.3ns 15.6ns create, copy in 1 chunk, delete 25.4ns 13.2ns extend existing chunk (alloc only) 9.3ns 5.7ns copy 2 small chunks out of buffer 19.2ns 19.2ns iterate over buffer with 5 chunks 51.0ns 22.6ns
The median read time in "clusterperf readDist" dropped about 40ns as a result of these changes (from 4.95µs to 4.91µs).