Page Comparison

This page is intended for recording steps we have taken over time to improve RAMCloud performance, along with measurements of the resulting performance gains. Add new entries at the beginning of the page, so that the entries are in reverse chronological order.

Fetch multiple completions at once in InfRcTransport (August 2 014, John Ousterhout)

InfRcTransport used to retrieve completions (both from serverRxCq and clientRxCq) one-at-a-time. This optimization changed the code to retrieve many at once, if there are several available. This improved "clusterperf readThroughput" from 875 kreads/sec to 948 kreads/sec.

Reclaim multiple transmit buffers at once in InfRcTransport (August 2014, John Ousterhout)

InfRcTransport used to call reapTxBuffers automatically at the end of the poll method. As a result, it typically only reclaimed one buffer at a time. This optimization changed the code so that it only calls readTxBuffers when transmit buffers are running low, so it can generally reclaim several at a time. This improved "clusterperf readThroughput" from 812 kreads/sec to 875 kreads/sec (in this benchmark, the dispatch thread is the bottleneck).

Optimizing class Object to treat contiguous Buffers as normal byte buffers (July 2014, Henry Qin)

...

Versions Compared

Old Version 9

New Version 10

Key

Fetch multiple completions at once in InfRcTransport (August 2 014, John Ousterhout)

Reclaim multiple transmit buffers at once in InfRcTransport (August 2014, John Ousterhout)

Optimizing class Object to treat contiguous Buffers as normal byte buffers (July 2014, Henry Qin)