This page is intended for recording steps we have taken over time to improve RAMCloud performance, along with measurements of the resulting performance gains. Add new entries at the beginning of the page, so that the entries are in reverse chronological order.
Rewrote Buffer.cc and Buffer.h from scratch to streamline and simplify, in the hopes of speeding up basic operations.
next
method and handling special cases related to the first chunk in the constructor. In the old version, there was significant complexity in each of next
, getData
, and getLength
, with significant duplication, and extra code to deal with the first chunk that had to be executed for every single chunk. In the new version, getData
and getLength
are in-line methods that do nothing except return precomputed values.Performance comparisons:
Old New create, append 1 chunk, delete 17.0ns 12.3ns create, alloc 1 chunk, delete 22.3ns 15.6ns create, copy in 1 chunk, delete 25.4ns 13.2ns extend existing chunk (alloc only) 9.3ns 5.7ns copy 2 small chunks out of buffer 19.2ns 19.2ns iterate over buffer with 5 chunks 51.0ns 22.6ns |
The median read time in "clusterperf readDist" dropped about 40ns as a result of these changes (from 4.95µs to 4.91µs).