clusterperf June 19, 2014
clusterperf output as measured on June 19, 2014 (rc1-rc20)
Recent changes that may have affected performance:
- Complete rewrite of Buffer; should be significantly faster
- ObjectFinder now caches sessions, eliminating calls to TransportManager and an additional hash table lookup there.
- Object representation has changed to support multiple keys; this probably introduces additional overheads.
- Basic index operations are now supported, and there are some new tests for those.
basic.read100 5.0 us read single 100B object with 30B key basic.readBw100 19.0 MB/s bandwidth reading 100B object with 30B key basic.read1K 7.1 us read single 1KB object with 30B key basic.readBw1K 135.1 MB/s bandwidth reading 1KB object with 30B key basic.read10K 10.5 us read single 10KB object with 30B key basic.readBw10K 912.0 MB/s bandwidth reading 10KB object with 30B key basic.read100K 47.8 us read single 100KB object with 30B key basic.readBw100K 1.9 GB/s bandwidth reading 100KB object with 30B key basic.read1M 421.2 us read single 1MB object with 30B key basic.readBw1M 2.2 GB/s bandwidth reading 1MB object with 30B key basic.write100 15.9 us write single 100B object with 30B key basic.writeBw100 6.0 MB/s bandwidth writing 100B object with 30B key basic.write1K 20.3 us write single 1KB object with 30B key basic.writeBw1K 47.0 MB/s bandwidth writing 1KB object with 30B key basic.write10K 37.4 us write single 10KB object with 30B key basic.writeBw10K 254.8 MB/s bandwidth writing 10KB object with 30B key basic.write100K 229.2 us write single 100KB object with 30B key basic.writeBw100K 416.1 MB/s bandwidth writing 100KB object with 30B key basic.write1M 2.2 ms write single 1MB object with 30B key basic.writeBw1M 431.7 MB/s bandwidth writing 1MB object with 30B key broadcast 142.0 us broadcast message to 9 slaves readNotFound 13.6 us read object that doesn't exist # RAMCloud index write, overwrite, lookup and read performance with varying number of objects. # All keys are 30 bytes and the value of the object is fixed to be 100 bytes. # Write and overwrite latencies are measured for the 'nth' object insertion where the size of the # table is 'n-1'. Lookup and indexedRead latencies are measured when the size of the index is 'n'. # All latency measurements are printed as 10 percentile/ median/ 90 percentile. # Generated by 'clusterperf.py indexBasic' # # n write latency(us) overwrite latency(us) lookup latency(us) lookup+read latency(us) #---------------------------------------------------------------------------------------------------------------------- 1 31.6/ 32.9/ 49.0 33.4/ 34.6/ 61.2 5.0/ 5.1/ 5.8 10.0/ 10.1/ 11.3 10 33.7/ 35.1/ 61.2 34.2/ 35.8/ 72.1 5.6/ 5.6/ 6.3 10.5/ 10.6/ 11.8 100 34.7/ 36.2/ 46.0 40.0/ 41.8/ 71.7 6.2/ 6.3/ 6.9 11.2/ 11.3/ 12.5 1000 35.9/ 37.0/ 46.5 36.7/ 37.9/ 63.9 7.0/ 7.0/ 7.6 11.9/ 12.1/ 13.3 10000 36.9/ 38.4/ 48.1 42.1/ 43.7/ 61.0 7.7/ 7.7/ 8.4 12.6/ 12.7/ 13.9 100000 36.7/ 38.0/ 46.9 37.4/ 38.9/ 77.8 8.2/ 8.3/ 8.9 13.1/ 13.2/ 14.4 1000000 37.8/ 39.1/ 52.9 38.3/ 39.8/ 65.0 9.2/ 9.3/ 9.6 14.2/ 14.2/ 15.4 # RAMCloud write/overwrite performance for 1000th object insertion with varying number of index keys. # The size of the table is 999 objects and is constant for this experiment. The latency measurements # are printed as 10 percentile/ median/ 90 percentile # Generated by 'clusterperf.py indexMultiple' # # Num secondary keys/obj write latency (us) overwrite latency (us) #--------------------------------------------------------------------------------- 0 13.7/ 14.3/ 16.3 14.4/ 15.0/ 16.9 1 36.5/ 37.6/ 44.6 36.4/ 37.9/ 47.2 2 38.4/ 40.1/ 77.4 38.4/ 40.1/ 67.9 3 42.0/ 43.8/ 84.6 42.0/ 43.8/ 89.8 4 44.8/ 46.9/ 89.1 44.7/ 46.4/ 79.4 5 47.5/ 49.8/ 98.7 46.9/ 49.2/ 95.3 6 48.2/ 51.8/ 89.9 47.9/ 51.4/ 93.0 7 49.9/ 52.4/ 107.5 50.0/ 52.2/ 97.1 8 52.5/ 55.1/ 86.3 52.2/ 54.8/ 88.4 9 56.0/ 61.2/ 101.3 55.4/ 60.7/ 104.7 10 54.8/ 58.7/ 116.0 53.9/ 57.4/ 117.5 # RAMCloud index scalability when 1 or more clients lookup/read # 100-byte objects with 30-byte keys chosen at random from # 1 indexlets. # Generated by 'clusterperf.py indexScalability' # # numClients throughput(klookups/sec) #------------------------------------- 1 144 2 271 3 373 4 373 5 366 6 364 7 363 8 360 9 360 10 298 # RAMCloud multiRead performance for an approximately fixed number # of 100 B objects with 30 byte keys # distributed evenly across varying number of masters. # Generated by 'clusterperf.py multiRead_general' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 5000 1 5000 2590.4 0.52 5000 2 2500 4241.6 0.85 4998 3 1666 3222.8 0.64 5000 4 1250 2315.6 0.46 5000 5 1000 2462.4 0.49 4998 6 833 1989.2 0.40 4998 7 714 3210.0 0.64 5000 8 625 1755.2 0.35 4995 9 555 2800.8 0.56 5000 10 500 1825.0 0.37 4994 11 454 1681.5 0.34 4992 12 416 1878.5 0.38 4992 13 384 1966.6 0.39 4998 14 357 1623.9 0.32 4995 15 333 2159.7 0.43 4992 16 312 2451.7 0.49 4998 17 294 2023.1 0.40 4986 18 277 1918.3 0.38 4997 19 263 1929.3 0.39 # RAMCloud multiRead performance for an approximately fixed number # of 100 B objects with 30 byte keys # distributed evenly across varying number of masters. # Requests are issued in a random order. # Generated by 'clusterperf.py multiRead_generalRandom' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 5000 1 5000 3649.6 0.73 5000 2 2500 1653.2 0.33 4998 3 1666 1531.4 0.31 5000 4 1250 1502.9 0.30 5000 5 1000 1580.2 0.32 4998 6 833 2137.4 0.43 4998 7 714 1614.1 0.32 5000 8 625 1757.6 0.35 4995 9 555 1727.2 0.35 5000 10 500 1780.2 0.36 4994 11 454 1824.5 0.37 4992 12 416 1835.7 0.37 4992 13 384 2017.5 0.40 4998 14 357 1854.5 0.37 4995 15 333 2024.4 0.41 4992 16 312 2206.4 0.44 4998 17 294 3338.2 0.67 4986 18 277 3575.3 0.72 4997 19 263 3579.4 0.72 # RAMCloud multiWrite performance for 100 B objects with 30 byte keys # located on a single master. # Generated by 'clusterperf.py multiWrite_oneMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 16.7 16.70 2 1 2 20.0 10.00 3 1 3 22.1 7.36 4 1 4 22.7 5.66 5 1 5 24.6 4.92 6 1 6 26.5 4.41 7 1 7 26.8 3.83 8 1 8 28.6 3.57 9 1 9 32.0 3.56 10 1 10 33.9 3.39 20 1 20 49.3 2.46 30 1 30 63.0 2.10 40 1 40 80.2 2.00 50 1 50 90.5 1.81 60 1 60 107.0 1.78 70 1 70 121.7 1.74 80 1 80 135.1 1.69 90 1 90 136.6 1.52 100 1 100 138.3 1.38 200 1 200 217.8 1.09 300 1 300 348.2 1.16 400 1 400 504.5 1.26 500 1 500 600.5 1.20 600 1 600 680.6 1.13 700 1 700 783.2 1.12 800 1 800 922.0 1.15 900 1 900 1026.7 1.14 1000 1 1000 1098.5 1.10 2000 1 2000 2188.2 1.09 3000 1 3000 3326.1 1.11 4000 1 4000 4280.9 1.07 5000 1 5000 5343.8 1.07 # RAMCloud multiRead performance for 100 B objects with 30 byte keys # located on a single master. # Generated by 'clusterperf.py multiRead_oneMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 5.3 5.28 2 1 2 6.9 3.43 3 1 3 7.9 2.64 4 1 4 8.7 2.18 5 1 5 9.4 1.88 6 1 6 10.6 1.77 7 1 7 11.3 1.61 8 1 8 11.8 1.48 9 1 9 12.6 1.40 10 1 10 13.2 1.32 20 1 20 18.9 0.95 30 1 30 25.9 0.86 40 1 40 31.5 0.79 50 1 50 37.7 0.75 60 1 60 43.1 0.72 70 1 70 49.1 0.70 80 1 80 54.4 0.68 90 1 90 55.8 0.62 100 1 100 58.5 0.59 200 1 200 101.3 0.51 300 1 300 170.7 0.57 400 1 400 245.2 0.61 500 1 500 298.3 0.60 600 1 600 346.4 0.58 700 1 700 418.0 0.60 800 1 800 478.0 0.60 900 1 900 533.4 0.59 1000 1 1000 616.9 0.62 2000 1 2000 1291.8 0.65 3000 1 3000 1771.6 0.59 4000 1 4000 2363.9 0.59 5000 1 5000 2943.4 0.59 # RAMCloud multiRead performance for 100 B objects with 30 byte keys # with one object located on each master. # Generated by 'clusterperf.py multiRead_oneObjectPerMaster' # # Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us) #---------------------------------------------------------------------------- 1 1 1 5.6 5.58 2 2 1 6.3 3.15 3 3 1 7.3 2.43 4 4 1 8.3 2.08 5 5 1 11.8 2.36 6 6 1 10.6 1.77 7 7 1 11.4 1.62 8 8 1 12.7 1.59 9 9 1 14.0 1.55 10 10 1 15.0 1.50 11 11 1 19.2 1.75 12 12 1 23.3 1.94 13 13 1 20.1 1.54 14 14 1 20.9 1.49 15 15 1 25.1 1.67 16 16 1 30.2 1.89 17 17 1 30.0 1.77 18 18 1 35.4 1.96 19 19 1 29.6 1.56 # Cumulative distribution of time for a single client to read a # single 100-byte object from a single server. Each line indicates # that a given fraction of all reads took at most a given time # to complete. # Generated by 'clusterperf.py readDist' # # Time (usec) Cum. Fraction #--------------------------- 0.00 0.000 4.56 0.000 4.60 0.010 4.60 0.020 4.61 0.030 4.61 0.040 4.62 0.050 4.62 0.060 4.62 0.070 4.63 0.080 4.63 0.090 4.64 0.100 4.64 0.110 4.64 0.120 4.65 0.130 4.65 0.140 4.65 0.150 4.66 0.160 4.67 0.170 4.67 0.180 4.68 0.190 4.68 0.200 4.68 0.210 4.68 0.220 4.68 0.230 4.69 0.240 4.69 0.250 4.69 0.260 4.69 0.270 4.69 0.280 4.69 0.290 4.69 0.300 4.69 0.310 4.69 0.320 4.69 0.330 4.69 0.340 4.69 0.350 4.70 0.360 4.70 0.370 4.70 0.380 4.70 0.390 4.70 0.400 4.70 0.410 4.70 0.420 4.70 0.430 4.70 0.440 4.70 0.450 4.70 0.460 4.70 0.470 4.70 0.480 4.70 0.490 4.70 0.500 4.71 0.510 4.71 0.520 4.71 0.530 4.71 0.540 4.71 0.550 4.71 0.560 4.71 0.570 4.71 0.580 4.71 0.590 4.71 0.600 4.71 0.610 4.71 0.620 4.71 0.630 4.71 0.640 4.72 0.650 4.72 0.660 4.72 0.670 4.72 0.680 4.72 0.690 4.72 0.700 4.72 0.710 4.73 0.720 4.73 0.730 4.73 0.740 4.74 0.750 4.75 0.760 4.76 0.770 4.77 0.780 4.78 0.790 4.79 0.800 4.81 0.810 4.83 0.820 4.85 0.830 4.87 0.840 4.91 0.850 5.16 0.860 5.35 0.870 5.37 0.880 5.38 0.890 5.39 0.900 5.40 0.910 5.50 0.920 5.85 0.930 5.90 0.940 5.91 0.950 5.92 0.960 5.94 0.970 5.99 0.980 6.41 0.990 66.84 0.999 102.11 0.9999 112.25 1.000 # RAMCloud read performance as a function of load (1 or more # clients all reading a single 100-byte object with 30-byte key # repeatedly). # Generated by 'clusterperf.py readLoaded' # # numClients readLatency(us) throughput(total kreads/sec) #---------------------------------------------------------- 1 5.1 195 2 5.3 377 3 5.4 553 4 7.7 523 5 9.0 557 6 8.4 711 7 9.8 715 8 11.2 714 9 12.8 702 10 14.2 704 11 15.7 699 12 16.8 714 13 18.6 700 14 19.8 707 15 21.3 703 16 22.6 706 17 53.6 317 18 26.3 684 19 27.8 683 20 29.2 685 # RAMCloud read performance when 1 or more clients read # 100-byte objects with 30-byte keys chosen at random from # 1 servers. # Generated by 'clusterperf.py readRandom' # # numClients throughput(total kreads/sec) slowest(ms) reads > 10us #-------------------------------------------------------------------- 1 175 6.45 0.2% 2 378 0.03 0.1% 3 506 0.05 0.4% 4 635 0.09 0.2% 5 679 0.09 0.6% 6 696 0.09 3.3% 7 698 0.31 69.2% 8 713 0.30 87.0% 9 716 0.11 88.5% 10 722 0.11 91.8% 11 824 0.10 90.4% 12 987 0.12 83.9% 13 947 0.11 88.6% 14 929 0.11 92.6% 15 928 0.11 93.1% 16 936 0.30 93.3% # RAMCloud read performance for 100 B objects # with keys of various lengths. # Generated by 'clusterperf.py readVaryingKeyLength' # # Key Length Latency (us) Bandwidth (MB/s) #---------------------------------------------------------------------------- 1 4.9 0.2 5 4.8 1.0 10 4.8 2.0 15 4.9 2.9 20 4.9 3.9 25 4.9 4.8 30 5.0 5.7 35 5.1 6.5 40 5.1 7.5 45 5.0 8.5 50 5.1 9.3 55 4.9 10.7 60 4.9 11.7 65 4.9 12.6 70 5.0 13.3 75 5.2 13.8 80 5.1 14.9 85 5.2 15.6 90 4.9 17.3 95 5.1 17.7 100 5.2 18.4 200 6.0 31.7 300 6.3 45.7 400 6.5 58.6 500 6.8 69.7 600 7.0 81.2 700 7.1 93.8 800 7.2 106.0 900 7.1 121.7 1000 7.2 132.4 2000 8.0 238.2 3000 8.8 326.6 4000 9.3 410.8 5000 10.0 475.4 6000 10.6 540.0 7000 11.3 589.1 8000 11.9 640.2 9000 12.8 672.0 10000 13.5 706.3 20000 21.4 890.5 30000 29.2 981.3 40000 39.7 961.8 50000 49.5 962.9 60000 57.4 996.5 # Gauges impact of asynchronous writes on synchronous writes. # Write two values. The size of the first varies over trials # (its size is given as 'firstObjectSize'). The first write is # either synchronous (if firstWriteIsSync is 1) or asynchronous # (if firstWriteIsSync is 0). The response time of the first # write is given by 'firstWriteLatency'. The second write is # a 100 B object which is always written synchronously (its # response time is given by 'syncWriteLatency' # Both writes use a 30 B key. # Generated by 'clusterperf.py writeAsyncSync' # # firstWriteIsSync firstObjectSize firstWriteLatency(us) syncWriteLatency(us) #---------------------------------------------------------------------------- 0 100 16.3 16.4 0 1000 20.3 16.3 0 10000 38.9 15.5 0 100000 230.9 18.4 0 1000000 2133.7 23.7 1 100 15.9 15.1 1 1000 19.1 15.1 1 10000 36.6 15.4 1 100000 224.9 19.0 1 1000000 2149.8 24.0 # RAMCloud write performance for 100 B objects # with keys of various lengths. # Generated by 'clusterperf.py writeVaryingKeyLength' # # Key Length Latency (us) Bandwidth (MB/s) #---------------------------------------------------------------------------- 1 16.1 0.1 5 15.8 0.3 10 15.6 0.6 15 15.3 0.9 20 15.4 1.2 25 15.5 1.5 30 16.3 1.8 35 16.4 2.0 40 16.5 2.3 45 16.7 2.6 50 16.2 2.9 55 16.0 3.3 60 15.6 3.7 65 15.7 4.0 70 15.4 4.3 75 16.1 4.4 80 15.6 4.9 85 16.0 5.1 90 16.4 5.2 95 16.8 5.4 100 16.5 5.8 200 17.5 10.9 300 18.4 15.6 400 19.1 20.0 500 19.8 24.1 600 19.9 28.7 700 21.3 31.3 800 21.9 34.8 900 22.6 37.9 1000 24.5 38.9 2000 29.3 65.2 3000 33.3 85.9 4000 37.8 101.0 5000 42.9 111.1 6000 46.4 123.4 7000 50.0 133.5 8000 52.3 145.9 9000 56.8 151.2 10000 61.3 155.6 20000 107.7 177.1 30000 155.9 183.5 40000 202.4 188.5 50000 252.4 188.9 60000 294.2 194.5