...
- Acquiring Dispatch::Lock in TransportManager::WorkerSession::clientSend for every write RPC
- Cost: 3 x ~300ns
- InfRcTransport<Infiniband>::getTransmitBuffer(): waiting for free tx buffer for every write RPC
- Cost: 3 x ~200ns (first write RPC more expensive than 2nd and 3rd)
- Calling into Infiniband transport: postSendZeroCopy (unavoidable?)
- Cost: 3 x ~400ns (first write RPC more expensive than 2nd and 3rd)
More detailed timeline
Below is a more detailed version of the timeline above which presents information about each and every recorded event in the code path:
Code Block | ||||
---|---|---|---|---|
| ||||
( Cumulative ) | Delta | [ Std. Dev. ] Event Description ( Min Mean Average Max ) | Min Mean Average Max | [ ] ( ) | | [ ] ( 0.0 ns 0.0 ns 0.0 ns ) | 0.0 ns 0.0 ns 0.0 ns | [ 0.0 ns ] Invoking sync() to backups ( 90.0 ns 139.0 ns 531.5 ns ) | 90.0 ns 139.0 ns 531.5 ns | [ 35.0 ns ] Invoking Send 'write segment' RPC ( 202.5 ns 280.2 ns 783.4 ns ) | 112.5 ns 141.1 ns 251.9 ns | [ 12.7 ns ] Invoking client.send() ( 218.5 ns 296.7 ns 802.8 ns ) | 16.0 ns 16.5 ns 19.4 ns | [ 0.8 ns ] Invoking session->clientSend() ( 441.4 ns 605.4 ns 5.4 us ) | 222.9 ns 308.8 ns 4.6 us | [ 119.6 ns ] Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend ( 497.0 ns 708.3 ns 5.6 us ) | 55.6 ns 102.8 ns 183.4 ns | [ 16.1 ns ] Invoking Instantiate new ClientRpc ( 566.9 ns 841.6 ns 16.9 us ) | 69.9 ns 133.4 ns 11.3 us | [ 256.0 ns ] Completed Instantiate new ClientRpc ( 583.9 ns 862.9 ns 16.9 us ) | 17.0 ns 21.3 ns 45.0 ns | [ 2.3 ns ] Invoking rpc->sendOrQueue ( 891.4 ns 1.3 us 21.4 us ) | 307.5 ns 387.9 ns 4.5 us | [ 97.0 ns ] Invoking postSendZeroCopy ( 1.3 us 1.8 us 22.6 us ) | 455.4 ns 587.0 ns 1.2 us | [ 47.2 ns ] Completed postSendZeroCopy ( 1.5 us 2.0 us 31.6 us ) | 127.5 ns 196.4 ns 9.0 us | [ 202.4 ns ] Completed rpc->sendOrQueue ( 1.5 us 2.1 us 31.7 us ) | 54.5 ns 73.2 ns 94.4 ns | [ 2.8 ns ] Completed session->clientSend() ( 1.5 us 2.1 us 31.7 us ) | 16.0 ns 16.9 ns 20.5 ns | [ 0.8 ns ] Completed client.send() ( 1.6 us 2.1 us 31.8 us ) | 16.0 ns 21.8 ns 35.5 ns | [ 2.3 ns ] Completed Send 'write segment' RPC ( 1.6 us 2.2 us 32.0 us ) | 37.5 ns 50.6 ns 224.0 ns | [ 15.5 ns ] Invoking Send 'write segment' RPC ( 1.7 us 2.3 us 32.2 us ) | 75.0 ns 98.3 ns 148.0 ns | [ 9.2 ns ] Invoking client.send() ( 1.7 us 2.3 us 32.2 us ) | 16.0 ns 16.6 ns 19.4 ns | [ 0.8 ns ] Invoking session->clientSend() ( 1.9 us 2.6 us 35.7 us ) | 175.9 ns 307.6 ns 3.5 us | [ 143.2 ns ] Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend ( 1.9 us 2.7 us 35.7 us ) | 33.4 ns 45.3 ns 70.9 ns | [ 1.3 ns ] Invoking Instantiate new ClientRpc ( 1.9 us 2.7 us 35.8 us ) | 45.0 ns 49.1 ns 78.4 ns | [ 4.3 ns ] Completed Instantiate new ClientRpc ( 2.0 us 2.7 us 35.8 us ) | 16.0 ns 16.6 ns 20.5 ns | [ 0.8 ns ] Invoking rpc->sendOrQueue ( 2.1 us 2.8 us 36.1 us ) | 95.1 ns 114.6 ns 270.0 ns | [ 9.4 ns ] Invoking postSendZeroCopy ( 2.3 us 3.2 us 40.9 us ) | 288.1 ns 334.2 ns 4.8 us | [ 104.3 ns ] Completed postSendZeroCopy ( 2.4 us 3.2 us 41.0 us ) | 53.5 ns 58.3 ns 107.4 ns | [ 4.6 ns ] Completed rpc->sendOrQueue ( 2.4 us 3.3 us 41.1 us ) | 52.5 ns 71.7 ns 90.0 ns | [ 3.0 ns ] Completed session->clientSend() ( 2.5 us 3.3 us 41.1 us ) | 16.0 ns 16.3 ns 30.0 ns | [ 0.6 ns ] Completed client.send() ( 2.5 us 3.3 us 41.2 us ) | 16.0 ns 22.5 ns 64.1 ns | [ 3.7 ns ] Completed Send 'write segment' RPC ( 2.5 us 3.4 us 41.3 us ) | 29.0 ns 44.6 ns 69.5 ns | [ 6.5 ns ] Invoking Send 'write segment' RPC ( 2.6 us 3.5 us 41.4 us ) | 69.5 ns 88.6 ns 148.0 ns | [ 10.8 ns ] Invoking client.send() ( 2.6 us 3.5 us 41.5 us ) | 16.0 ns 19.2 ns 35.5 ns | [ 3.3 ns ] Invoking session->clientSend() ( 2.8 us 3.8 us 45.5 us ) | 167.0 ns 302.3 ns 4.0 us | [ 177.7 ns ] Acquired Dispatch::Lock in TransportManager::WorkerSession::clientSend ( 2.8 us 3.8 us 45.5 us ) | 21.5 ns 30.0 ns 68.5 ns | [ 5.0 ns ] Invoking Instantiate new ClientRpc ( 2.8 us 3.9 us 45.6 us ) | 42.6 ns 52.9 ns 109.4 ns | [ 6.1 ns ] Completed Instantiate new ClientRpc ( 2.8 us 3.9 us 45.7 us ) | 16.0 ns 16.5 ns 62.0 ns | [ 1.3 ns ] Invoking rpc->sendOrQueue ( 3.0 us 4.0 us 46.0 us ) | 119.0 ns 146.3 ns 259.4 ns | [ 9.2 ns ] Invoking postSendZeroCopy ( 3.3 us 4.4 us 50.9 us ) | 311.9 ns 368.3 ns 4.9 us | [ 147.6 ns ] Completed postSendZeroCopy ( 3.3 us 4.5 us 59.8 us ) | 53.5 ns 60.1 ns 8.9 us | [ 202.2 ns ] Completed rpc->sendOrQueue ( 3.4 us 4.5 us 59.9 us ) | 52.5 ns 56.5 ns 101.9 ns | [ 4.9 ns ] Completed session->clientSend() ( 3.4 us 4.5 us 59.9 us ) | 16.0 ns 16.7 ns 39.5 ns | [ 1.0 ns ] Completed client.send() ( 3.4 us 4.6 us 59.9 us ) | 16.0 ns 34.4 ns 43.0 ns | [ 8.4 ns ] Completed Send 'write segment' RPC ( 4.0 us 8.2 us 68.4 us ) | 581.9 ns 3.6 us 8.5 us | [ 714.2 ns ] Reaped outstanding RPC (close) ( 4.3 us 9.8 us 116.3 us ) | 287.0 ns 1.6 us 47.9 us | [ 1.4 us ] Reaped outstanding RPC (close) ( 4.6 us 10.7 us 121.5 us ) | 276.5 ns 959.9 ns 5.2 us | [ 646.2 ns ] Reaped outstanding RPC (close) ( 4.6 us 10.8 us 126.1 us ) | 48.1 ns 60.2 ns 4.6 us | [ 104.0 ns ] Completed sync() to backups |
Even more detail
...
Cumulative distributions for each event
Each event in the timeline is associated with a (cumulative) distribution of "completion" times. The "completion" time for a given event is the time delta between that event and its preceding event; hence, the cdf for the first event is quite boring. The major time sinks described above are labeled by their corresponding number. (Note: the x-axis in the cdf's below is in seconds.)
...
Event Completed sync() to backups
On the the Backup
Average Averaged over 3515 same-shape timelines.
...
Code Block | ||
---|---|---|
| ||
0 us --- InfRcTransport Poller picks ups incoming RPC [dispatch thread] | 325 ns --- Invoke service.handleRpc() [worker thread] | 885 ns --- Completed service.handleRpc() [worker thread] | 1 us --- Begin sending reply [dispatch thread] | | | 1.8 us --- Completed sending reply [dispatch thread] |
More detailed timeline
Just as we did for the master, we have a more detailed timeline for the events on the backup server:
Code Block | ||||
---|---|---|---|---|
| ||||
( 0.0 ns 0.0 ns 0.0 ns ) | 0.0 ns 0.0 ns 0.0 ns | [ 0.0 ns ] Invoking append to buffer [dispatch]
( 34.4 ns 76.2 ns 2.8 us ) | 34.4 ns 76.2 ns 2.8 us | [ 47.6 ns ] Completed append to buffer [dispatch]
( 51.4 ns 102.1 ns 2.9 us ) | 17.0 ns 25.8 ns 56.9 ns | [ 8.0 ns ] Invoking serviceManager->handleRpc() [dispatch]
( 146.5 ns 226.0 ns 5.8 us ) | 95.1 ns 124.0 ns 2.9 us | [ 102.8 ns ] Completed serviceManager->handleRpc() [dispatch]
( 170.0 ns 325.7 ns 68.3 us ) | 23.5 ns 99.7 ns 62.5 us | [ 1.3 us ] Invoking service.handleRpc() [worker]
( 275.0 ns 458.8 ns 68.6 us ) | 105.0 ns 133.1 ns 374.7 ns | [ 29.3 ns ] Invoking service.dispatch() [worker]
( 295.1 ns 484.3 ns 68.8 us ) | 20.1 ns 25.5 ns 212.1 ns | [ 7.0 ns ] Invoking callHandler() [worker]
( 378.3 ns 589.2 ns 73.9 us ) | 83.2 ns 104.9 ns 5.1 us | [ 107.2 ns ] Invoking SegmentInfo->write() [worker]
( 492.5 ns 774.0 ns 79.1 us ) | 114.2 ns 184.8 ns 5.2 us | [ 115.6 ns ] Completed SegmentInfo->write() [worker]
( 523.5 ns 823.0 ns 79.3 us ) | 31.0 ns 49.0 ns 116.9 ns | [ 3.0 ns ] Completed callHandler() [worker]
( 553.5 ns 854.3 ns 79.4 us ) | 30.0 ns 31.2 ns 90.0 ns | [ 1.6 ns ] Completed service.dispatch() [worker]
( 576.7 ns 885.0 ns 79.4 us ) | 23.2 ns 30.7 ns 93.1 ns | [ 7.3 ns ] Completed service.handleRpc() [worker]
( 675.9 ns 1.0 us 83.1 us ) | 99.2 ns 153.7 ns 3.7 us | [ 109.4 ns ] Invoking worker->rpc->sendReply() [dispatch]
( 858.0 ns 1.3 us 86.2 us ) | 182.1 ns 224.8 ns 3.1 us | [ 70.2 ns ] Invoking postSend() [dispatch]
( 1.1 us 1.6 us 90.7 us ) | 290.1 ns 330.6 ns 4.5 us | [ 113.1 ns ] Completed postSend() [dispatch]
( 1.3 us 1.8 us 93.8 us ) | 169.1 ns 216.5 ns 3.1 us | [ 98.8 ns ] Completed worker->rpc->sendReply() [dispatch]
|
Cumulative distributions for each event
Just as we did for the master, we have the cdf's for the events on the backup server:
...