clusterperf January 2015
clusterperf output as measured on January 30, 2015 (rc1-rc20)
Recent changes that may have affected performance:
- Added DispatchExec, which saves 1us on writes
- Reduced Multi-op scheduler batch size
- Indexing Benchmarks included
basic.read100 4.8 us read random 100B object (30B key) median basic.read100.min 4.4 us read random 100B object (30B key) minimum basic.read100.9 5.5 us read random 100B object (30B key) 90% basic.read100.99 7.0 us read random 100B object (30B key) 99% basic.read100.999 8.8 us read random 100B object (30B key) 99.9% basic.readBw100 19.1 MB/s bandwidth reading 100B objects (30B key) basic.read1K 6.9 us read random 1KB object (30B key) median basic.read1K.min 6.0 us read random 1KB object (30B key) minimum basic.read1K.9 7.9 us read random 1KB object (30B key) 90% basic.read1K.99 8.8 us read random 1KB object (30B key) 99% basic.read1K.999 11.7 us read random 1KB object (30B key) 99.9% basic.readBw1K 132.5 MB/s bandwidth reading 1KB objects (30B key) basic.read10K 10.1 us read random 10KB object (30B key) median basic.read10K.min 9.0 us read random 10KB object (30B key) minimum basic.read10K.9 11.1 us read random 10KB object (30B key) 90% basic.read10K.99 12.2 us read random 10KB object (30B key) 99% basic.read10K.999 34.6 us read random 10KB object (30B key) 99.9% basic.readBw10K 917.6 MB/s bandwidth reading 10KB objects (30B key) basic.read100K 42.9 us read random 100KB object (30B key) median basic.read100K.min 36.7 us read random 100KB object (30B key) minimum basic.read100K.9 44.2 us read random 100KB object (30B key) 90% basic.read100K.99 51.4 us read random 100KB object (30B key) 99% basic.read100K.999 92.9 us read random 100KB object (30B key) 99.9% basic.readBw100K 2.2 GB/s bandwidth reading 100KB objects (30B key) basic.read1M 357.7 us read random 1MB object (30B key) median basic.read1M.min 322.6 us read random 1MB object (30B key) minimum basic.read1M.9 363.4 us read random 1MB object (30B key) 90% basic.read1M.99 366.4 us read random 1MB object (30B key) 99% basic.read1M.999 411.0 us read random 1MB object (30B key) 99.9% basic.readBw1M 2.6 GB/s bandwidth reading 1MB objects (30B key) basic.write100 14.0 us write random 100B object (30B key) median basic.write100.min 12.5 us write random 100B object (30B key) minimum basic.write100.9 15.1 us write random 100B object (30B key) 90% basic.write100.99 22.9 us write random 100B object (30B key) 99% basic.write100.999 106.6 us write random 100B object (30B key) 99.9% basic.writeBw100 6.5 MB/s bandwidth writing 100B objects (30B key) basic.write1K 18.5 us write random 1KB object (30B key) median basic.write1K.min 16.7 us write random 1KB object (30B key) minimum basic.write1K.9 20.2 us write random 1KB object (30B key) 90% basic.write1K.99 84.8 us write random 1KB object (30B key) 99% basic.write1K.999 159.0 us write random 1KB object (30B key) 99.9% basic.writeBw1K 46.2 MB/s bandwidth writing 1KB objects (30B key) basic.write10K 34.6 us write random 10KB object (30B key) median basic.write10K.min 31.6 us write random 10KB object (30B key) minimum basic.write10K.9 37.2 us write random 10KB object (30B key) 90% basic.write10K.99 133.0 us write random 10KB object (30B key) 99% basic.write10K.999 273.0 us write random 10KB object (30B key) 99.9% basic.writeBw10K 247.9 MB/s bandwidth writing 10KB objects (30B key) basic.write100K 227.7 us write random 100KB object (30B key) median basic.write100K.min 209.8 us write random 100KB object (30B key) minimum basic.write100K.9 261.4 us write random 100KB object (30B key) 90% basic.write100K.99 391.6 us write random 100KB object (30B key) 99% basic.write100K.999 475.8 us write random 100KB object (30B key) 99.9% basic.writeBw100K 401.9 MB/s bandwidth writing 100KB objects (30B key) basic.write1M 2.2 ms write random 1MB object (30B key) median basic.write1M.min 2.1 ms write random 1MB object (30B key) minimum basic.write1M.9 2.3 ms write random 1MB object (30B key) 90% basic.write1M.99 2.4 ms write random 1MB object (30B key) 99% basic.write1M.999 2.6 ms write random 1MB object (30B key) 99.9% basic.writeBw1M 428.1 MB/s bandwidth writing 1MB objects (30B key) # RAMCloud multiRead performance for 100 B objects with 30 byte keys # located on a single master. # Generated by 'clusterperf.py multiRead_oneMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 5.4 5.42 2 1 2 6.8 3.41 3 1 3 8.0 2.67 4 1 4 9.4 2.35 5 1 5 9.9 1.98 6 1 6 11.3 1.88 7 1 7 11.5 1.64 8 1 8 13.0 1.62 9 1 9 13.0 1.45 10 1 10 13.5 1.35 20 1 20 20.0 1.00 30 1 30 22.0 0.73 40 1 40 25.9 0.65 50 1 50 28.9 0.58 60 1 60 30.2 0.50 70 1 70 31.8 0.45 80 1 80 34.1 0.43 90 1 90 36.1 0.40 100 1 100 39.2 0.39 200 1 200 68.3 0.34 300 1 300 87.2 0.29 400 1 400 111.1 0.28 500 1 500 141.9 0.28 600 1 600 169.8 0.28 700 1 700 200.9 0.29 800 1 800 228.1 0.29 900 1 900 255.7 0.28 1000 1 1000 271.8 0.27 2000 1 2000 526.3 0.26 3000 1 3000 781.7 0.26 4000 1 4000 1059.6 0.26 5000 1 5000 1327.9 0.27 # RAMCloud multiRead performance for 100 B objects with 30 byte keys # with one object located on each master. # Generated by 'clusterperf.py multiRead_oneObjectPerMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 5.6 5.63 2 2 1 6.6 3.28 3 3 1 8.0 2.66 4 4 1 8.7 2.19 5 5 1 9.8 1.96 6 6 1 12.8 2.13 7 7 1 12.5 1.79 8 8 1 13.4 1.68 9 9 1 15.2 1.69 10 10 1 15.7 1.57 11 11 1 20.1 1.82 12 12 1 24.0 2.00 13 13 1 25.0 1.92 14 14 1 24.9 1.78 15 15 1 25.7 1.71 16 16 1 28.0 1.75 17 17 1 29.5 1.73 18 18 1 31.5 1.75 19 19 1 32.9 1.73 # RAMCloud multi-read throughput of a single server with a # varying number of clients issuing 80-object multi-reads on # randomly-chosen 100-byte objects with 30-byte keys # Generated by 'clusterperf.py multiReadThroughput' # # numClients throughput worker utiliz. # (kreads/sec) #------------------------------------------- 1 2335 1.283 2 3854 2.539 3 4591 2.850 4 4806 2.845 5 4789 2.902 6 4688 2.860 7 4731 2.902 8 4446 2.838 9 4811 2.805 10 4503 2.819 11 4228 2.816 12 4217 2.800 13 3436 2.457 14 4453 2.763 # RAMCloud multiWrite performance for 100 B objects with 30 byte keys # located on a single master. # Generated by 'clusterperf.py multiWrite_oneMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 14.4 14.45 2 1 2 18.3 9.13 3 1 3 20.0 6.65 4 1 4 21.5 5.38 5 1 5 22.9 4.59 6 1 6 26.0 4.34 7 1 7 28.6 4.08 8 1 8 28.1 3.51 9 1 9 31.0 3.44 10 1 10 30.4 3.04 20 1 20 46.0 2.30 30 1 30 57.3 1.91 40 1 40 75.4 1.88 50 1 50 77.4 1.55 60 1 60 1035.3 17.25 70 1 70 102.5 1.46 80 1 80 124.0 1.55 90 1 90 140.6 1.56 100 1 100 541.1 5.41 200 1 200 859.5 4.30 300 1 300 330.8 1.10 400 1 400 3308.4 8.27 500 1 500 532.4 1.06 600 1 600 4463.5 7.44 700 1 700 822.9 1.18 800 1 800 4147.5 5.18 900 1 900 1068.8 1.19 1000 1 1000 7225.7 7.23 2000 1 2000 1985.7 0.99 3000 1 3000 45247.7 15.08 4000 1 4000 3883.8 0.97 5000 1 5000 92639.4 18.53 # Cumulative distribution of time for a single client to read a # random 100-byte object from a single server. Each line indicates # that a given fraction of all reads took at most a given time # to complete. # Generated by 'clusterperf.py readDist' # # Time (usec) Cum. Fraction #--------------------------- 0.00 0.000 4.42 0.000 4.56 0.010 4.57 0.020 4.58 0.030 4.58 0.040 4.59 0.050 4.59 0.060 4.60 0.070 4.60 0.080 4.61 0.090 4.61 0.100 4.62 0.110 4.62 0.120 4.63 0.130 4.63 0.140 4.63 0.150 4.64 0.160 4.64 0.170 4.64 0.180 4.64 0.190 4.65 0.200 4.65 0.210 4.65 0.220 4.65 0.230 4.65 0.240 4.66 0.250 4.66 0.260 4.66 0.270 4.66 0.280 4.66 0.290 4.67 0.300 4.67 0.310 4.67 0.320 4.67 0.330 4.67 0.340 4.68 0.350 4.68 0.360 4.68 0.370 4.68 0.380 4.68 0.390 4.69 0.400 4.69 0.410 4.69 0.420 4.69 0.430 4.70 0.440 4.70 0.450 4.71 0.460 4.71 0.470 4.71 0.480 4.72 0.490 4.72 0.500 4.72 0.510 4.73 0.520 4.73 0.530 4.74 0.540 4.74 0.550 4.74 0.560 4.75 0.570 4.75 0.580 4.76 0.590 4.77 0.600 4.77 0.610 4.78 0.620 4.79 0.630 4.79 0.640 4.80 0.650 4.81 0.660 4.82 0.670 4.83 0.680 4.83 0.690 4.84 0.700 4.86 0.710 4.87 0.720 4.88 0.730 4.89 0.740 4.91 0.750 4.93 0.760 4.95 0.770 4.99 0.780 5.06 0.790 5.17 0.800 5.22 0.810 5.25 0.820 5.28 0.830 5.30 0.840 5.32 0.850 5.34 0.860 5.36 0.870 5.38 0.880 5.40 0.890 5.43 0.900 5.48 0.910 5.55 0.920 5.71 0.930 5.82 0.940 5.87 0.950 5.91 0.960 5.98 0.970 6.11 0.980 7.08 0.990 10.45 0.999 84.21 0.9999 103.29 1.000 # RAMCloud read throughput of a single server with a varying # number of clients issuing individual reads on randomly # chosen 100-byte objects with 30-byte keys # Generated by 'clusterperf.py readThroughput' # # numClients throughput worker utiliz. # (kreads/sec) #------------------------------------------- 1 203 0.162 2 386 0.342 3 550 0.526 4 680 0.654 5 723 0.670 6 844 0.845 7 942 0.959 8 978 1.020 9 859 0.794 10 903 0.844 11 902 0.832 12 905 0.841 13 902 0.832 14 902 0.836 # RAMCloud read performance for 100 B objects # with keys of various lengths. # Generated by 'clusterperf.py readVaryingKeyLength' # # Key Length Latency (us) Bandwidth (MB/s) #---------------------------------------------------------------------------- 1 4.4 21.8 5 4.4 22.5 10 4.5 23.4 15 4.5 24.4 20 4.5 25.4 25 4.5 26.4 30 4.5 27.8 35 4.6 28.2 40 4.6 29.2 45 4.6 30.2 50 4.6 31.2 55 4.6 32.1 60 4.6 33.1 65 4.6 34.2 70 4.6 35.1 75 4.7 35.8 80 4.7 36.9 85 4.7 37.7 90 4.7 38.7 95 4.7 39.4 100 4.7 40.4 200 5.5 52.0 300 5.7 66.8 400 6.1 78.2 500 6.3 91.4 600 6.3 105.2 700 6.5 117.8 800 6.6 129.6 900 6.8 141.2 1000 6.9 152.3 2000 7.6 264.7 3000 8.2 359.1 4000 8.9 438.6 5000 9.7 501.8 6000 10.4 561.4 7000 11.0 616.3 8000 11.7 660.4 9000 12.5 694.4 10000 13.3 723.5 20000 21.8 880.2 30000 29.8 962.9 40000 40.6 942.7 50000 49.0 975.2 60000 57.3 1001.0 # RAMCloud write performance for 100 B objects # with keys of various lengths. # Generated by 'clusterperf.py writeVaryingKeyLength' # # Key Length Latency (us) Bandwidth (MB/s) #---------------------------------------------------------------------------- 1 12.9 7.5 5 12.9 7.8 10 12.9 8.1 15 12.9 8.5 20 13.0 8.8 25 13.1 9.1 30 13.2 9.4 35 13.1 9.8 40 13.2 10.1 45 13.2 10.4 50 13.3 10.8 55 13.4 11.1 60 13.4 11.4 65 13.4 11.7 70 13.4 12.1 75 13.4 12.5 80 13.4 12.8 85 14.0 12.6 90 14.0 12.9 95 14.4 12.9 100 14.4 13.3 200 14.9 19.2 300 15.7 24.3 400 16.4 29.1 500 17.1 33.4 600 17.6 38.0 700 18.1 42.2 800 18.9 45.4 900 19.6 48.6 1000 20.0 52.4 2000 23.6 85.0 3000 27.4 107.9 4000 31.4 124.7 5000 35.2 138.1 6000 39.0 149.3 7000 42.9 157.7 8000 47.2 163.8 9000 51.4 168.8 10000 55.4 173.8 20000 104.0 184.3 30000 151.1 190.0 40000 198.1 193.1 50000 238.3 200.5 60000 283.4 202.2
# RAMCloud index write, overwrite, lookup+readHashes, andIndexLookup class performance with varying number of objects. 1000 samples per operation takenafter 1 warmups. # All keys are 30 bytes and the value of the object is fixed to be 100 bytes. # Write and overwrite latencies are measured for the 'nth' object insertion where the size of the table is 'n-1'. # Lookup, readHashes, and latencies are measured by reading a single object when the size of the index is 'n'. # All latency measurements are printed as 10th percentile/ median/ 90th percentile. # # Generated by 'clusterperf.py indexBasic' # # n write latency(us) overwrite latency(us) hash lookup(us) lookup+read(us) IndexLookup(us) IndexLookup overhead #-------------------------------------------------------------------------------------------------------------------------------------------------------- 1 26.6/ 27.6/ 33.6 28.5/ 29.5/ 32.1 4.9/ 4.9/ 6.1 9.3/ 9.4/ 10.5 9.6/ 9.8/ 10.9 0.34/ 0.40/ 0.39 10 29.3/ 30.5/ 47.6 29.7/ 30.8/ 34.0 5.3/ 5.4/ 6.6 9.7/ 9.8/ 11.0 10.0/ 10.2/ 11.2 0.35/ 0.34/ 0.16 100 30.5/ 31.7/ 36.2 42.2/ 43.8/ 47.8 6.0/ 6.3/ 7.7 10.4/ 11.2/ 12.5 10.8/ 11.0/ 12.4 0.36/ -0.24/ -0.07 1000 31.1/ 32.4/ 43.8 49.3/ 50.7/ 56.4 6.4/ 6.5/ 7.7 10.8/ 11.5/ 12.2 11.2/ 11.3/ 12.3 0.34/ -0.23/ 0.08 10000 32.5/ 34.1/ 42.1 55.9/ 57.6/ 71.9 7.7/ 9.0/ 10.4 12.1/ 13.8/ 14.9 12.4/ 13.3/ 14.8 0.33/ -0.41/ -0.10 100000 34.1/ 35.5/ 70.6 62.2/ 64.1/ 70.8 7.9/ 8.4/ 9.1 12.3/ 13.0/ 13.9 12.6/ 12.8/ 13.3 0.29/ -0.20/ -0.64 1000000 34.3/ 35.7/ 74.6 65.6/ 67.6/ 97.2 8.7/ 8.9/ 10.2 13.3/ 14.0/ 15.2 13.5/ 14.2/ 15.0 0.25/ 0.24/ -0.11 # RAMCloud write/overwrite performance for 1000th object insertion with varying number of index keys. # The size of the table is 999 objects and is constant for this experiment. The latency measurements # are printed as 10 percentile/ median/ 90 percentile # Generated by 'clusterperf.py indexMultiple' # # Num secondary keys/obj write latency (us) overwrite latency (us) #--------------------------------------------------------------------------------- 0 11.6/ 12.4/ 35.1 12.5/ 13.1/ 14.4 1 31.0/ 32.4/ 39.8 31.1/ 32.2/ 34.9 2 33.1/ 35.4/ 71.9 32.8/ 34.5/ 39.7 3 35.4/ 38.0/ 68.8 35.3/ 37.3/ 66.9 4 35.8/ 37.6/ 83.0 35.6/ 37.0/ 62.8 5 38.5/ 41.1/ 98.4 38.5/ 40.2/ 53.6 6 39.3/ 41.7/ 80.4 38.6/ 40.6/ 64.0 7 39.8/ 42.5/ 98.4 39.5/ 41.5/ 74.1 8 43.9/ 47.7/ 98.2 43.4/ 46.8/ 83.9 9 44.2/ 46.5/ 74.7 43.4/ 45.4/ 59.3 10 47.0/ 49.3/ 83.0 47.5/ 49.2/ 73.8