Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

RAMCloud Benchmarks

A single backup operation (ClusterPerf with 100-byte writes, 1 master, 3 backups). For each operation, a timeline of events was logged. Not all timelines had the same "shape", as not all backup operations are handled by the same sequence of events. Thus, the most common timeline "shape" was chosen, and the timelines below represent the average of the most common timeline shape. This procedure was done for both the backup and the master.

On the Master

Averaged over 1912 same-shape timelines.

 

Timeline on a Master
0 us --- Begin backup (BackupManager::sync())
| 
|
1.8 us --- First write RPC sent out
| 
|
3.2 us --- Second write RPC sent out
| 
|
4.4 us --- Third write RPC sent out
| 
|
| [~ 4 us "dead time"]
|
|
8.2 us --- First write RPC completes (duration: 6.4 us)
|
9.8 us --- Second write RPC completes (duration: 6.6 us)
|
10.7 us --- Third write RPC completes (duration: 6.3 us)
10.8 us --- End backup

 

Major time sinks in issue path

  • Acquiring Dispatch::Lock in TransportManager::WorkerSession::clientSend for every write RPC
    • Cost: 3 x ~300ns
  • InfRcTransport<Infiniband>::getTransmitBuffer(): waiting for free tx buffer for every write RPC
    • Cost: 3 x ~200ns (first write RPC more expensive than 2nd and 3rd)
  • Calling into Infiniband transport: postSendZeroCopy (unavoidable?)
    • Cost: 3 x ~400ns (first write RPC more expensive than 2nd and 3rd)

More detailed timeline

Below is a more detailed version of the timeline presented above. Each event in the timeline is associated with a (cumulative) distribution of "completion" times. The "completion" time for a given event is the time delta between that event and its preceding event; hence, the cdf for the first event is quite boring. (Note: the x-axis in the cdf's below is in seconds.)

Event Invoking sync() to backups


Event Invoking Send 'write segment' RPC


Event Invoking client.send()


Event Invoking session->clientSend()


Event Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend


Event Invoking Instantiate new ClientRpc


Event Completed Instantiate new ClientRpc


Event Invoking rpc->sendOrQueue


Event Invoking postSendZeroCopy


Event Completed postSendZeroCopy


Event Completed rpc->sendOrQueue


Event Completed session->clientSend()


Event Completed client.send()


Event Completed Send 'write segment' RPC


Event Invoking Send 'write segment' RPC


Event Invoking client.send()


Event Invoking session->clientSend()


Event Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend


Event Invoking Instantiate new ClientRpc


Event Completed Instantiate new ClientRpc


Event Invoking rpc->sendOrQueue


Event Invoking postSendZeroCopy


Event Completed postSendZeroCopy


Event Completed rpc->sendOrQueue


Event Completed session->clientSend()


Event Completed client.send()


Event Completed Send 'write segment' RPC


Event Invoking Send 'write segment' RPC


Event Invoking client.send()


Event Invoking session->clientSend()


Event Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend


Event Invoking Instantiate new ClientRpc


Event Completed Instantiate new ClientRpc


Event Invoking rpc->sendOrQueue


Event Invoking postSendZeroCopy


Event Completed postSendZeroCopy


Event Completed rpc->sendOrQueue


Event Completed session->clientSend()


Event Completed client.send()


Event Completed Send 'write segment' RPC


Event Reaped outstanding RPC (non-close)


Event Reaped outstanding RPC (non-close)


Event Reaped outstanding RPC (non-close)


Event Completed sync() to backups

On the the Backup

Average over 9584 same-shape timelines.

 

Timeline on Backup
0 us ---- InfRcTransport Poller picks ups incoming RPC [dispatch thread]
|
325 ns -- Invoke service.handleRpc()                   [worker thread]
|
885 ns -- Completed service.handleRpc()                [worker thread]
|
1 us -- Begin sending reply                          [dispatch thread]
|
|
|
1.8 us -- Completed sending reply                      [dispatch thread]

More detailed timeline

Just as we did for the master, we have the cdf's for the events on the backup server:

Event Invoking appendToBuffer() [dispatch]


Event Completed appendToBuffer() [dispatch]


Event Invoking serviceManager->handleRpc() [dispatch]


Event Completed serviceManager->handleRpc() [dispatch]


Event Invoking service.handleRpc() [worker]


Event Invoking service.dispatch() [worker]


Event Invoking callHandler() [worker]


Event Invoking SegmentInfo->write() [worker]


Event Completed SegmentInfo->write() [worker]


Event Completed callHandler() [worker]


Event Completed service.dispatch() [worker]


Event Completed service.handleRpc() [worker]


Event Invoking worker->rpc->sendReply() [dispatch]


Event Invoking postSend() [dispatch]


Event Completed postSend() [dispatch]


Event Completed worker->rpc->sendReply() [dispatch]


Benchmark IB Send vs. RDMA

Simple program to benchmark 56-byte write.

Averaged over 100 samples.

One-way (with completion on sender)

Using IB send: 2.753 us

Using RDMA: 2.50495 us

RTT (RPC-style)

Using IB send: 4.969 us (explains write RPC latency seen in RAMCloud: 5 + 1 = 6 us)

Using RDMA: 4.866 us

We see that a one-way RDMA easily beats the round-trip IB send's currently used RAMCloud RPC.

  • No labels