Recap

Assumptions

Failures

Possible Solution

9/30 Discussion Notes

Data Durability

Write Bandwidth Limitations

Questions

10/7 Discussion (Logging/Cleaning on Backups)

Knobs/Tradeoffs

Cluster writes in chunks ("shardlets") to be batch written to segments on disk

Question remains: Log compaction or checkpointing to keep replay time from growing indefinitely

No-read LFS-style cleaning

Back of the envelope

Master cleans shardlets with less than some amount of utilization

Shardlet Util.

Disk/Master Memory Util. (Median Shardlet Util.)

TTR Off Optimal

Remaining Disk BW Off Optimal

Combined WE with 90% approach from earlier

50%

75%

1.33x slower

~50%

45%

10%

55%

1.82x slower

~88%

80%

Fast (fine-grained, shard interleaved) logging + shardlets

10/8 Discussion ("Paging" to yield higher utilization on masters)