Goal
Measure and understand the performance of Infiniband under load.
Parameters of the experiments
- Read operations performed by clients. Single table/object is read
over and over.
- Cluster used - cluster hardware info is at Cluster+Configuration
rc02 - server (master)
rc03 - client (queen)
rc04-31 - client (worker) - multiple if required.
- Call strack in code being measured
InfRcTransport<Infiniband>::ServerRpc::sendReply()
InfRcTransport<Infiniband>::getTransmitBuffer()
infiniband->pollCompletionQueue(commonTxCq,
MAX_TX_QUEUE_DEPTH,
retArray);
|
Results/Graphs
Reference Graph - Throughput of the system for 100 byte object
reads using different Transmit Buffer Pool sizes
Analysis of throughput curves.
- Summary - throughput drops to 50% under high load.
- The throughput of the system is measured here against increasing
load. The load is in terms of read operations on 100 byte objects.
- We notice that the throughput of the system drops by a factor of 2
for high loads. This is observed even though we are nowhere near the
network limits at this point. The measured outgoing throughput is
390217 ops/sec or 39M bytes/sec or 310M bits/sec which is well under
the expected 32Gbps limit.
- The red, blue and green lines were measured with 24 RX buffers and
8, 32 and 64 TX buffers respectively.
- The violet line was measured with 48 RX buffers and 8 TX
buffers. Notice that adding buffers to the pool on the receive side
allows the trasmit side to see a higher throughput - I do not
understand the reasons for this.
- A set of further measurements are taken during the same experiment
and plotted on different graphs to aid understanding.
Latency Graph - Time spent in pollCQ per read (average) across
different Transmit Buffer Pool sizes
Analysis
- Summary - pollCompletionQueue() (and hence getTransmitBuffer()) take
longer to run with increasing load. Note that pollCompletionQueue()
would be called multiple times until a succesful return - an empty
buffer from the pool.
- Red/Blue lines represent 24 RX buffers and 8 TX buffers
- Green/Violet lines represent 24 RX buffers and 32 TX buffers
- Orange/Pink lines represent 24 RX buffers and 64 TX buffers
- This a plot of measurements of time taken by the different functions
during the experiments.
- Total time spent in pollCQ was tracked and then divided by the
number of read calls to calculate the average.
- This tracks the curve of time spent within the getTransmitBuffer
call well. The difference between the two needs to be explained.
Latency Graph - Time spent in pollCQ per read (average) - fixed
pool of buffers - comparing time taken by successful calls against
calls that return 0
Analysis
- Summary - pollCompletionQueue() (and hence getTransmitBuffer()) take
longer to run with increasing load. Note that pollCompletionQueue()
would be called multiple times until a succesful return - an empty
buffer from the pool.
- Red/Blue lines represent 24 RX buffers and 8 TX buffers
- Green/Violet lines represent 24 RX buffers and 32 TX buffers
- Orange/Pink lines represent 24 RX buffers and 64 TX buffers
- This a plot of measurements of time taken by the different functions
during the experiments.
- Total time spent in pollCQ was tracked and then divided by the
number of read calls to calculate the average.
- This tracks the curve of time spent within the getTransmitBuffer
call well. The difference between the two needs to be explained.
Latency Graph - Time spent in pollCQ per read (average) - fixed
pool of buffers - comparing time taken by successful calls against
calls that return 0
Analysis
- This is the same latency curve as above restricted to the case where
the size of the buffer pool for TX buffers is 8.
- The Red line represents avg time taken by the getTransmitBuffer() call.
- The Blue line represents avg time taken across all the calls to
pollCompletionQueue()
- The Green line represents the average time taken by calls to
pollCompletionQueue() calls that returned zero empty buffers.
- The Violet line represents the average time taken by calls to
pollCompletionQueue() calls that returned non-zero empty buffers.
- Note that time taken per successful call increased slightly with
load. Number of calls however increased with load resulting in overall
time taken by getTransmitBuffer() increasing.
Latency Graph - Average number of buffers returned by pollCQ
across different Buffer Pool sizes
Analysis
- The red, blue and green lines were measured with 24 RX buffers and
8, 32 and 64 TX buffers respectively.
- The violet line was measured with 48 RX buffers and 8 TX buffers.
- An interesting trend that appears to be independent of number of
buffers in the pool. There is a drop in the average at the same load
irrespective of buffer-pool.
- Why does doubling the number of receive buffers affect the number of
empty transmit buffers returned ? Compare Red against Violet.