Metrics Files
Here are some representative metrics files to consult and compare against as the system evolves.
SOSP '11 Paper Experiments
Note that all data files (full logs, not just metrics) are in the paper repo.
Figure 8 (Proportional Scaling, 1 to 20 masters, 6 backup processes per master)
Note that the paper results were averages of 5 runs.
1 Master, 6 backups:
=== Summary === Recovery time: 0.985 s Masters: 1 Backups: 6 Replicas: 3 Objects per master: 592415 Object size: 1024 bytes Total live objects: 592415 Total recovery segment entries: 592415 Total live object space: 578.53 MB Total recovery segment space w/ overhead: 600.00 MB Storage type: disk Log directory: /home/rumble/ramcloud/recovery/20110826170432 === Coordinator Time === Total: 982.4 ms / 99.75% of total recovery Starting recovery on backups: 0.7 ms / 0.07% of total recovery Starting recovery on masters: 1.8 ms / 0.18% of total recovery Tablets recovered: 0.3 ms / 0.03% of total recovery Completing recovery on backups: 0.1 ms / 0.01% of total recovery Set will: 0.0 ms / 0.00% of total recovery Get tablet map: 19.9 ms / 2.02% of total recovery Other: 959.8 ms / 97.45% of total recovery Receiving in transport: 13.2 ms / 1.34% of total recovery === Master Time === Total: 0.0 ms / 0.00% of total recovery Waiting for incoming segments: 457.4 ms / 46.44% of total recovery Inside recoverSegment: 504.7 ms / 51.24% of total recovery Backup.proceed: 8.5 ms / 0.86% of total recovery Verify checksum: 0.0 ms / 0.00% of total recovery Segment append: 322.2 ms / 32.72% of total recovery Segment append copy: 51.4 ms / 5.21% of total recovery Segment append checksum: 147.3 ms / 14.96% of total recovery Other (HT, etc.): 174.0 ms / 17.66% of total recovery (other) Final log sync: 9.7 ms / 0.98% of total recovery Removing tombstones: 0.0 ms / 0.00% of total recovery Other: -971.8 ms / -98.67% of total recovery Receiving in transport: 3.7 ms / 0.38% of total recovery Transmitting in transport: 29.2 ms / 2.97% of total recovery Replicating one segment: 1.2 ms During replay: 1.2 ms During log sync: 0.8 ms RPC latency replicating one segment: 2.1 ms (for R-th replica) During replay: 2.1 ms (for R-th replica) During log sync: 1.2 ms (for R-th replica) Replication: 847.2 ms / 86.01% of total recovery Client RPCs Active: 873.7 ms / 88.71% of total recovery Average GRD completion time: 42.0 ms Task iterations: 2789253 === Backup Time === RPC service time: 87.7 ms avg / max 111.3 ms / 8.90% avg of total recovery startReadingData: 0.1 ms avg / max 0.1 ms / 0.01% avg of total recovery Open/write segment: 82.5 ms avg / max 105.9 ms / 8.38% avg of total recovery Open segment memset: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Copy: 82.1 ms avg / max 105.3 ms / 8.34% avg of total recovery Other: 0.4 ms avg / max 0.5 ms / 0.04% avg of total recovery getRecoveryData: 2.3 ms avg / max 4.4 ms / 0.23% avg of total recovery Other: 2.8 ms avg / max 3.2 ms / 0.29% avg of total recovery Transmitting in transport: 24.1 ms avg / max 29.2 ms / 2.45% avg of total recovery Filtering segments: 54.0 ms avg / max 60.7 ms / 5.48% avg of total recovery Reading segments: 866.8 ms avg / max 957.4 ms / 88.01% avg of total recovery Using disk: 864.0 ms avg / max 954.7 ms / 87.72% avg of total recovery getRecoveryData completions: 13 avg / max 14 getRecoveryData retry fraction: 0.798 avg / max 0.959 === Efficiency === recoverSegment CPU: 6.64 ms avg (per filtered segment) Writing a segment: 0.49 ms avg (backup RPC thread) Filtering a segment: 4.32 ms avg === Network Utilization === Aggregate: 34.15 Gb/s / 17.79% of network capacity (overall) Master in: 7.42 Gb/s (overall) Master out: 15.10 Gb/s (overall) Master out during replication: 17.56 Gb/s (overall) Master out during log sync: 19.49 Gb/s (overall) Backup in: 3.17 Gb/s avg / min 2.03 Gb/s / 19.04 Gb/s total (overall) Backup out: 3.17 Gb/s avg / min 0.70 Gb/s / 19.04 Gb/s total (overall) === Disk Utilization === Effective bandwidth: 116.42 MB/s avg / min 105.59 MB/s / 698.54 MB/s total Active bandwidth: 122.46 MB/s avg / min 113.82 MB/s / 734.74 MB/s total Reading: 115.51 MB/s avg / min 107.07 MB/s / 693.07 MB/s total Writing: 169.53 MB/s avg / min 141.75 MB/s / 678.11 MB/s total Disk active: 95.13% avg / max 98.87% (of total recovery) Reading: 87.72% avg / max 96.93% (of total recovery) Writing: 7.41% avg / max 17.19% (of total recovery) === Backup Events === Segments read: 12.5 avg / max 14.0 Primary segments loaded: 12.7 avg / max 14.0 Secondary segments loaded: 0.0 avg / max 0.0 === Slowest Servers === Backup opens, writes: server.rc01 (284.5 ms) Stalled reading segs from backups: server.rc01 (457.4 ms) Reading from disk: backup.rc03 (107.1 MB/s) Writing to disk: backup.rc01 (141.7 MB/s)
20 Masters, 120 Backups:
=== Summary === Recovery time: 1.118 s Masters: 20 Backups: 120 Replicas: 3 Objects per master: 592415.0 avg / max 592415.0 Object size: 1024 bytes avg / max 1024 bytes Total live objects: 11848300 Total recovery segment entries: 11848300 Total live object space: 11570.61 MB Total recovery segment space w/ overhead: 11999.98 MB Storage type: disk Log directory: /home/rumble/ramcloud/recovery/20110826174510 === Coordinator Time === Total: 1115.9 ms / 99.80% of total recovery Starting recovery on backups: 11.7 ms / 1.05% of total recovery Starting recovery on masters: 7.0 ms / 0.63% of total recovery Tablets recovered: 2.3 ms / 0.21% of total recovery Completing recovery on backups: 0.8 ms / 0.07% of total recovery Set will: 0.0 ms / 0.00% of total recovery Get tablet map: 145.3 ms / 13.00% of total recovery Other: 949.6 ms / 84.92% of total recovery Receiving in transport: 6.4 ms / 0.57% of total recovery === Master Time === Total: 1041.1 ms avg / max 1101.7 ms / 93.11% avg of total recovery Waiting for incoming segments: 531.6 ms avg / max 552.4 ms / 47.54% avg of total recovery Inside recoverSegment: 522.1 ms avg / max 540.8 ms / 46.69% avg of total recovery Backup.proceed: 7.9 ms avg / max 8.9 ms / 0.70% avg of total recovery Verify checksum: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Segment append: 333.8 ms avg / max 349.2 ms / 29.85% avg of total recovery Segment append copy: 51.3 ms avg / max 56.9 ms / 4.59% avg of total recovery Segment append checksum: 153.6 ms avg / max 166.0 ms / 13.74% avg of total recovery Other (HT, etc.): 180.4 ms avg / max 185.7 ms / 16.13% avg of total recovery (other) Final log sync: 12.6 ms avg / max 16.7 ms / 1.12% avg of total recovery Removing tombstones: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Other: -25.2 ms avg / max 39.4 ms / -2.25% avg of total recovery Receiving in transport: 6.5 ms avg / max 10.9 ms / 0.58% avg of total recovery Transmitting in transport: 38.2 ms avg / max 42.2 ms / 3.41% avg of total recovery Replicating one segment: 1.3 ms avg / max 1.3 ms During replay: 1.3 ms avg / max 1.3 ms During log sync: 1.1 ms avg / max 1.4 ms RPC latency replicating one segment: 2.7 ms avg / max 2.8 ms (for R-th replica) During replay: 2.7 ms avg / max 2.8 ms (for R-th replica) During log sync: 1.5 ms avg / max 2.1 ms (for R-th replica) Replication: 942.2 ms avg / max 955.5 ms / 84.26% avg of total recovery Client RPCs Active: 985.7 ms avg / max 998.0 ms / 88.15% avg of total recovery Average GRD completion time: 2.4 ms avg / max 2.4 ms Task iterations: 3193584 avg / max 3519448 === Backup Time === RPC service time: 80.0 ms avg / max 263.2 ms / 7.15% avg of total recovery startReadingData: 0.1 ms avg / max 0.2 ms / 0.01% avg of total recovery Open/write segment: 75.1 ms avg / max 131.8 ms / 6.72% avg of total recovery Open segment memset: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Copy: 74.7 ms avg / max 131.1 ms / 6.68% avg of total recovery Other: 0.4 ms avg / max 0.7 ms / 0.04% avg of total recovery getRecoveryData: 2.2 ms avg / max 158.2 ms / 0.20% avg of total recovery Other: 2.6 ms avg / max 43.2 ms / 0.23% avg of total recovery Transmitting in transport: 27.7 ms avg / max 42.2 ms / 2.48% avg of total recovery Filtering segments: 68.5 ms avg / max 91.0 ms / 6.13% avg of total recovery Reading segments: 845.2 ms avg / max 1087.4 ms / 75.59% avg of total recovery Using disk: 842.1 ms avg / max 1083.0 ms / 75.31% avg of total recovery getRecoveryData completions: 250 avg / max 300 getRecoveryData retry fraction: 0.063 avg / max 0.992 === Efficiency === recoverSegment CPU: 0.35 ms avg (per filtered segment) Writing a segment: 0.44 ms avg (backup RPC thread) Filtering a segment: 5.48 ms avg === Network Utilization === Aggregate: 601.92 Gb/s / 15.67% of network capacity (overall) Master in: 6.47 Gb/s avg / min 5.20 Gb/s / 129.34 Gb/s total (overall) Master out: 13.30 Gb/s avg / min 13.08 Gb/s / 265.98 Gb/s total (overall) Master out during replication: 15.78 Gb/s avg / min 15.57 Gb/s / 315.70 Gb/s total (overall) Master out during log sync: 15.83 Gb/s avg / min 11.34 Gb/s / 316.68 Gb/s total (overall) Backup in: 2.80 Gb/s avg / min 1.29 Gb/s / 335.45 Gb/s total (overall) Backup out: 2.80 Gb/s avg / min 0.56 Gb/s / 335.42 Gb/s total (overall) === Disk Utilization === Effective bandwidth: 116.92 MB/s avg / min 78.70 MB/s / 14030.17 MB/s total Active bandwidth: 122.00 MB/s avg / min 81.26 MB/s / 14640.46 MB/s total Reading: 118.93 MB/s avg / min 73.87 MB/s / 14271.83 MB/s total Writing: 135.00 MB/s avg / min 83.26 MB/s / 16064.48 MB/s total Disk active: 95.84% avg / max 99.50% (of total recovery) Reading: 75.31% avg / max 96.85% (of total recovery) Writing: 20.53% avg / max 30.30% (of total recovery) === Backup Events === Segments read: 12.5 avg / max 15.0 Primary segments loaded: 250.3 avg / max 300.0 Secondary segments loaded: 0.0 avg / max 0.0 === Slowest Servers === Backup opens, writes: server.rc09 (351.9 ms) Stalled reading segs from backups: server.rc09 (552.4 ms) Reading from disk: backup.rc22 (73.9 MB/s) Writing to disk: backup.rc14 (83.3 MB/s)
Figure 9 (Proportional Scaling, 2 to 60 masters, 2 backup processes per master)
After rerunning a few of the data points from this graph it looks like we (unfortunately) scaled the cluster by adding processes on two machines at a time before starting to double up master/backup processes with backup processes on the same machine. Doubling up processes right away doesn't yield the same results as the paper even with the code from SOSP'11.
2 Masters, 4 Backups:
=== Summary === Recovery time: 1.290 s Masters: 2 Backups: 4 Replicas: 3 Objects per master: 592415.0 avg / max 592415.0 Object size: 1024 bytes avg / max 1024 bytes Total live objects: 1184830 Total recovery segment entries: 1184830 Total live object space: 1157.06 MB Total recovery segment space w/ overhead: 1200.00 MB Storage type: disk Log directory: /home/rumble/ramcloud/recovery/20110830182651 === Coordinator Time === Total: 1287.5 ms / 99.78% of total recovery Starting recovery on backups: 3.5 ms / 0.27% of total recovery Starting recovery on masters: 0.5 ms / 0.04% of total recovery Tablets recovered: 0.3 ms / 0.03% of total recovery Completing recovery on backups: 0.1 ms / 0.01% of total recovery Set will: 0.0 ms / 0.00% of total recovery Get tablet map: 26.1 ms / 2.02% of total recovery Other: 1257.0 ms / 97.42% of total recovery Receiving in transport: 14.7 ms / 1.14% of total recovery === Master Time === Total: 639.0 ms avg / max 1278.0 ms / 49.52% avg of total recovery Waiting for incoming segments: 558.8 ms avg / max 572.9 ms / 43.31% avg of total recovery Inside recoverSegment: 633.2 ms avg / max 647.1 ms / 49.07% avg of total recovery Backup.proceed: 13.2 ms avg / max 13.6 ms / 1.02% avg of total recovery Verify checksum: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Segment append: 412.3 ms avg / max 419.6 ms / 31.95% avg of total recovery Segment append copy: 56.9 ms avg / max 59.4 ms / 4.41% avg of total recovery Segment append checksum: 189.3 ms avg / max 194.3 ms / 14.67% avg of total recovery Other (HT, etc.): 207.7 ms avg / max 213.8 ms / 16.10% avg of total recovery (other) Final log sync: 77.3 ms avg / max 80.8 ms / 5.99% avg of total recovery Removing tombstones: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Other: -630.3 ms avg / max 12.1 ms / -48.85% avg of total recovery Receiving in transport: 8.3 ms avg / max 8.5 ms / 0.64% avg of total recovery Transmitting in transport: 123.8 ms avg / max 126.3 ms / 9.59% avg of total recovery Replicating one segment: 1.6 ms avg / max 1.6 ms During replay: 1.6 ms avg / max 1.6 ms During log sync: 1.2 ms avg / max 1.2 ms RPC latency replicating one segment: 3.4 ms avg / max 3.4 ms (for R-th replica) During replay: 3.5 ms avg / max 3.5 ms (for R-th replica) During log sync: 2.4 ms avg / max 2.5 ms (for R-th replica) Replication: 1227.0 ms avg / max 1230.7 ms / 95.09% avg of total recovery Client RPCs Active: 1249.5 ms avg / max 1251.7 ms / 96.84% avg of total recovery Average GRD completion time: 26.7 ms avg / max 26.8 ms Task iterations: 1950173 avg / max 1971082 === Backup Time === RPC service time: 407.2 ms avg / max 440.4 ms / 31.56% avg of total recovery startReadingData: 0.1 ms avg / max 0.1 ms / 0.01% avg of total recovery Open/write segment: 397.9 ms avg / max 433.3 ms / 30.84% avg of total recovery Open segment memset: 0.1 ms avg / max 0.1 ms / 0.00% avg of total recovery Copy: 395.8 ms avg / max 431.2 ms / 30.68% avg of total recovery Other: 2.0 ms avg / max 2.1 ms / 0.16% avg of total recovery getRecoveryData: 6.0 ms avg / max 7.8 ms / 0.46% avg of total recovery Other: 3.3 ms avg / max 3.5 ms / 0.26% avg of total recovery Transmitting in transport: 119.2 ms avg / max 126.3 ms / 9.24% avg of total recovery Filtering segments: 345.2 ms avg / max 370.2 ms / 26.75% avg of total recovery Reading segments: 1131.8 ms avg / max 1182.3 ms / 87.72% avg of total recovery Using disk: 1116.9 ms avg / max 1132.1 ms / 86.56% avg of total recovery getRecoveryData completions: 76 avg / max 78 getRecoveryData retry fraction: 0.774 avg / max 0.823 === Efficiency === recoverSegment CPU: 4.19 ms avg (per filtered segment) Writing a segment: 0.78 ms avg (backup RPC thread) Filtering a segment: 9.21 ms avg === Network Utilization === Aggregate: 54.51 Gb/s / 42.58% of network capacity (overall) Master in: 9.03 Gb/s avg / min 8.82 Gb/s / 18.07 Gb/s total (overall) Master out: 12.72 Gb/s avg / min 12.69 Gb/s / 25.43 Gb/s total (overall) Master out during replication: 13.32 Gb/s avg / min 13.31 Gb/s / 26.64 Gb/s total (overall) Master out during log sync: 16.86 Gb/s avg / min 16.54 Gb/s / 33.72 Gb/s total (overall) Backup in: 7.27 Gb/s avg / min 5.38 Gb/s / 29.07 Gb/s total (overall) Backup out: 7.27 Gb/s avg / min 1.79 Gb/s / 29.07 Gb/s total (overall) Master Infiniband TX Active: 1.2 s avg / min 1.2 s / 2.4 s total === Disk Utilization === Effective bandwidth: 257.30 MB/s avg / min 254.20 MB/s / 1029.20 MB/s total Active bandwidth: 261.67 MB/s avg / min 259.48 MB/s / 1046.66 MB/s total Reading: 268.59 MB/s avg / min 268.36 MB/s / 1074.38 MB/s total Writing: 210.99 MB/s avg / min 198.68 MB/s / 843.94 MB/s total Disk active: 98.33% avg / max 99.47% (of total recovery) Reading: 86.56% avg / max 87.73% (of total recovery) Writing: 11.77% avg / max 12.48% (of total recovery) === Backup Events === Segments read: 37.5 avg / max 38.0 Primary segments loaded: 75.5 avg / max 78.0 Secondary segments loaded: 0.0 avg / max 0.0 === Slowest Servers === Backup opens, writes: server.rc02 (362.5 ms) Stalled reading segs from backups: server.rc02 (572.9 ms) Reading from disk: backup.rc02 (268.4 MB/s) Writing to disk: backup.rc02 (198.7 MB/s)
10 Masters, 20 Backups:
=== Summary === Recovery time: 1.410 s Masters: 10 Backups: 20 Total nodes: 20 Replicas: 3 Objects per master: 592415.0 avg / max 592415.0 Object size: 1024 bytes avg / max 1024 bytes Total live objects: 5924150 Total recovery segment entries: 5924150 Total live object space: 5785.30 MB Total recovery segment space w/ overhead: 5999.99 MB Storage type: disk Log directory: /home/stutsman/src/ramcloud-sosp/recovery/20121018083553 === Coordinator Time === Total: 1407.7 ms / 99.85% of total recovery Starting recovery on backups: 4.5 ms / 0.32% of total recovery Starting recovery on masters: 2.2 ms / 0.16% of total recovery Tablets recovered: 1.0 ms / 0.07% of total recovery Completing recovery on backups: 0.2 ms / 0.01% of total recovery Set will: 0.0 ms / 0.00% of total recovery Get tablet map: 75.4 ms / 5.35% of total recovery Other: 1324.7 ms / 93.96% of total recovery Receiving in transport: 8.9 ms / 0.63% of total recovery === Master Time === Total: 1398.7 ms avg / max 1404.1 ms / 99.21% avg of total recovery Waiting for incoming segments: 562.7 ms avg / max 585.6 ms / 39.91% avg of total recovery Inside recoverSegment: 784.0 ms avg / max 827.1 ms / 55.61% avg of total recovery Backup.proceed: 21.1 ms avg / max 28.3 ms / 1.50% avg of total recovery Verify checksum: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Segment append: 464.6 ms avg / max 487.4 ms / 32.96% avg of total recovery Segment append copy: 49.6 ms avg / max 55.2 ms / 3.52% avg of total recovery Segment append checksum: 224.4 ms avg / max 235.0 ms / 15.92% avg of total recovery Other (HT, etc.): 298.3 ms avg / max 317.9 ms / 21.16% avg of total recovery (other) Final log sync: 13.0 ms avg / max 15.4 ms / 0.92% avg of total recovery Removing tombstones: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Other: 38.9 ms avg / max 53.2 ms / 2.76% avg of total recovery Receiving in transport: 14.0 ms avg / max 24.8 ms / 0.99% avg of total recovery Transmitting in transport: 152.7 ms avg / max 164.8 ms / 10.83% avg of total recovery Replicating one segment: 1.6 ms avg / max 1.7 ms During replay: 1.6 ms avg / max 1.7 ms During log sync: 1.1 ms avg / max 1.3 ms RPC latency replicating one segment: 2.7 ms avg / max 2.9 ms (for R-th replica) During replay: 2.7 ms avg / max 2.9 ms (for R-th replica) During log sync: 1.5 ms avg / max 1.9 ms (for R-th replica) Replication: 1291.3 ms avg / max 1313.0 ms / 91.59% avg of total recovery Client RPCs Active: 1363.4 ms avg / max 1370.1 ms / 96.71% avg of total recovery Average GRD completion time: 6.3 ms avg / max 6.3 ms Task iterations: 2984825 avg / max 3355883 === Backup Time === RPC service time: 379.0 ms avg / max 545.4 ms / 26.88% avg of total recovery startReadingData: 0.2 ms avg / max 0.2 ms / 0.01% avg of total recovery Open/write segment: 369.5 ms avg / max 535.5 ms / 26.21% avg of total recovery Open segment memset: 0.1 ms avg / max 0.1 ms / 0.00% avg of total recovery Copy: 367.8 ms avg / max 532.5 ms / 26.09% avg of total recovery Other: 1.7 ms avg / max 2.8 ms / 0.12% avg of total recovery getRecoveryData: 6.1 ms avg / max 27.4 ms / 0.43% avg of total recovery Other: 3.2 ms avg / max 7.9 ms / 0.23% avg of total recovery Transmitting in transport: 130.5 ms avg / max 164.8 ms / 9.25% avg of total recovery Filtering segments: 248.6 ms avg / max 316.2 ms / 17.64% avg of total recovery Reading segments: 1310.0 ms avg / max 1372.7 ms / 92.92% avg of total recovery Using disk: 1304.0 ms avg / max 1370.3 ms / 92.49% avg of total recovery getRecoveryData completions: 376 avg / max 390 getRecoveryData retry fraction: 0.365 avg / max 0.898 === Efficiency === recoverSegment CPU: 1.04 ms avg (per filtered segment) Writing a segment: 0.73 ms avg (backup RPC thread) Filtering a segment: 6.63 ms avg Memory bandwidth (backup copies): 2.51 GB/s avg / min 1.85 GB/s === Network Utilization === Aggregate: 249.47 Gb/s / 49.89% of network capacity (overall) Master in: 8.50 Gb/s avg / min 7.98 Gb/s / 84.95 Gb/s total (overall) Master out: 11.63 Gb/s avg / min 11.62 Gb/s / 116.29 Gb/s total (overall) Master out during replication: 12.70 Gb/s avg / min 12.47 Gb/s / 127.00 Gb/s total (overall) Master out during log sync: 15.15 Gb/s avg / min 12.29 Gb/s / 151.52 Gb/s total (overall) Backup in: 6.65 Gb/s avg / min 4.39 Gb/s / 133.01 Gb/s total (overall) Backup out: 6.65 Gb/s avg / min 1.64 Gb/s / 133.01 Gb/s total (overall) === Disk Utilization === Effective bandwidth: 226.41 MB/s avg / min 221.30 MB/s / 4528.14 MB/s total Active bandwidth: 230.77 MB/s avg / min 222.62 MB/s / 4615.49 MB/s total Reading: 230.25 MB/s avg / min 219.16 MB/s / 4604.97 MB/s total Writing: 257.32 MB/s avg / min 181.80 MB/s / 4889.07 MB/s total Disk active: 98.12% avg / max 99.54% (of total recovery) Reading: 92.49% avg / max 97.20% (of total recovery) Writing: 5.63% avg / max 10.77% (of total recovery) === Backup Events === Segments read: 37.5 avg / max 39.0 Primary segments loaded: 375.5 avg / max 390.0 Secondary segments loaded: 0.0 avg / max 0.0 === Slowest Servers === Backup opens, writes: server.rc03 (366.2 ms) Stalled reading segs from backups: server.rc03 (585.6 ms) Reading from disk: server.rc02 (219.2 MB/s) Writing to disk: server.rc11 (181.8 MB/s)
60 Masters, 120 Backups:
=== Summary === Recovery time: 1.624 s Masters: 60 Backups: 120 Replicas: 3 Objects per master: 592415.0 avg / max 592415.0 Object size: 1024 bytes avg / max 1024 bytes Total live objects: 35544900 Total recovery segment entries: 35544900 Total live object space: 34711.82 MB Total recovery segment space w/ overhead: 35999.95 MB Storage type: disk Log directory: /home/rumble/ramcloud/recovery/20110830204624 === Coordinator Time === Total: 1619.4 ms / 99.74% of total recovery Starting recovery on backups: 35.4 ms / 2.18% of total recovery Starting recovery on masters: 57.0 ms / 3.51% of total recovery Tablets recovered: 9.8 ms / 0.60% of total recovery Completing recovery on backups: 2.2 ms / 0.14% of total recovery Set will: 0.0 ms / 0.00% of total recovery Get tablet map: 442.9 ms / 27.28% of total recovery Other: 1074.4 ms / 66.17% of total recovery Receiving in transport: 4.5 ms / 0.28% of total recovery === Master Time === Total: 1494.6 ms avg / max 1557.2 ms / 92.05% avg of total recovery Waiting for incoming segments: 850.4 ms avg / max 893.9 ms / 52.38% avg of total recovery Inside recoverSegment: 596.6 ms avg / max 655.7 ms / 36.75% avg of total recovery Backup.proceed: 6.8 ms avg / max 10.4 ms / 0.42% avg of total recovery Verify checksum: 0.0 ms avg / max 0.0 ms / 0.00% avg of total recovery Segment append: 389.0 ms avg / max 436.7 ms / 23.96% avg of total recovery Segment append copy: 57.6 ms avg / max 68.5 ms / 3.54% avg of total recovery Segment append checksum: 183.0 ms avg / max 221.6 ms / 11.27% avg of total recovery Other (HT, etc.): 200.8 ms avg / max 218.6 ms / 12.36% avg of total recovery (other) Final log sync: 22.1 ms avg / max 37.1 ms / 1.36% avg of total recovery Removing tombstones: 0.0 ms avg / max 0.1 ms / 0.00% avg of total recovery Other: 25.4 ms avg / max 65.2 ms / 1.57% avg of total recovery Receiving in transport: 16.4 ms avg / max 21.7 ms / 1.01% avg of total recovery Transmitting in transport: 115.4 ms avg / max 138.6 ms / 7.11% avg of total recovery Replicating one segment: 1.9 ms avg / max 2.0 ms During replay: 1.9 ms avg / max 2.0 ms During log sync: 1.4 ms avg / max 2.1 ms RPC latency replicating one segment: 3.9 ms avg / max 4.4 ms (for R-th replica) During replay: 3.9 ms avg / max 4.5 ms (for R-th replica) During log sync: 2.5 ms avg / max 4.6 ms (for R-th replica) Replication: 1494.0 ms avg / max 1536.1 ms / 92.02% avg of total recovery Client RPCs Active: 1508.1 ms avg / max 1545.9 ms / 92.88% avg of total recovery Average GRD completion time: 1.2 ms avg / max 1.2 ms Task iterations: 2965835 avg / max 3258522 === Backup Time === RPC service time: 309.5 ms avg / max 398.2 ms / 19.06% avg of total recovery startReadingData: 0.6 ms avg / max 0.9 ms / 0.04% avg of total recovery Open/write segment: 300.9 ms avg / max 389.7 ms / 18.54% avg of total recovery Open segment memset: 0.1 ms avg / max 1.1 ms / 0.00% avg of total recovery Copy: 299.1 ms avg / max 387.4 ms / 18.42% avg of total recovery Other: 1.8 ms avg / max 8.1 ms / 0.11% avg of total recovery getRecoveryData: 5.3 ms avg / max 10.5 ms / 0.33% avg of total recovery Other: 2.6 ms avg / max 6.4 ms / 0.16% avg of total recovery Transmitting in transport: 112.2 ms avg / max 138.6 ms / 6.91% avg of total recovery Filtering segments: 325.9 ms avg / max 369.2 ms / 20.07% avg of total recovery Reading segments: 1121.6 ms avg / max 1151.5 ms / 69.08% avg of total recovery Using disk: 1118.3 ms avg / max 1133.8 ms / 68.88% avg of total recovery getRecoveryData completions: 2256 avg / max 2280 getRecoveryData retry fraction: 0.001 avg / max 0.079 === Efficiency === recoverSegment CPU: 0.13 ms avg (per filtered segment) Writing a segment: 0.59 ms avg (backup RPC thread) Filtering a segment: 8.67 ms avg === Network Utilization === Aggregate: 1300.67 Gb/s / 33.87% of network capacity (overall) Master in: 7.24 Gb/s avg / min 6.43 Gb/s / 434.55 Gb/s total (overall) Master out: 10.11 Gb/s avg / min 10.08 Gb/s / 606.39 Gb/s total (overall) Master out during replication: 10.96 Gb/s avg / min 10.66 Gb/s / 657.46 Gb/s total (overall) Master out during log sync: 14.22 Gb/s avg / min 9.31 Gb/s / 852.95 Gb/s total (overall) Backup in: 5.78 Gb/s avg / min 3.54 Gb/s / 693.24 Gb/s total (overall) Backup out: 5.78 Gb/s avg / min 1.42 Gb/s / 693.01 Gb/s total (overall) Master Infiniband TX Active: 1.5 s avg / min 1.4 s / 89.0 s total === Disk Utilization === Effective bandwidth: 237.33 MB/s avg / min 187.23 MB/s / 28479.37 MB/s total Active bandwidth: 246.69 MB/s avg / min 237.17 MB/s / 29602.49 MB/s total Reading: 268.85 MB/s avg / min 263.45 MB/s / 32262.50 MB/s total Writing: 191.45 MB/s avg / min 165.05 MB/s / 22400.18 MB/s total Disk active: 96.35% avg / max 99.30% (of total recovery) Reading: 68.88% avg / max 69.83% (of total recovery) Writing: 27.48% avg / max 31.45% (of total recovery) === Backup Events === Segments read: 37.6 avg / max 38.0 Primary segments loaded: 2255.5 avg / max 2280.0 Secondary segments loaded: 0.0 avg / max 0.0 === Slowest Servers === Backup opens, writes: server.rc48 (537.5 ms) Stalled reading segs from backups: server.rc59 (893.9 ms) Reading from disk: server.rc21 (263.5 MB/s) Writing to disk: backup.rc38 (165.1 MB/s)