Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Recap

Assumptions

  • Must be just as durable as disk-storage
  • Durable when write request returns
  • Almost always in a state of crash recovery
  • Recovery instantaneous and transparent
  • Records of a few thousand bytes, perhaps partial updates

Failures

  • Bit errors
  • Single machine
  • Group of machines
  • Power failure
  • Loss of datacenter
  • Network partitions
  • Byzantine failures
  • Thermal emergency

Possible Solution

  • Approach #4: temporarily commit to another server's RAM for speed, eventually to disk
    • some form of logging in RAM and batching writes to disk + checkpointing
    • need to "shard" data for each server so that many servers serve as a backup for a single master to speed recovery time
      • likely, backup shards will need to be able to temporarily become masters for the data while rebuilding the master

Issues

  • Disk write bandwidth
  • Best approach to achieve good write bandwidth on disk: have at least 3 disks, one for write logging, one archiving the last amount of log data, one for compaction. Using this scheme we can completely eliminate seeks which should give us about 100 MB/s. Unfortunately, we'll need RAID as well so the total is more than 3 disks just to achieve the write bandwidth of 1 disk.

Alternative

Recovery

  • All soft state, fixes alot of consistency issues on recovery.
  • Compare backups after a master is elected to check consistency, version numbers make this possible, only the most recently written value can be inconsistent, so we only need to compare k values for k backup shards. This is much cheaper than an agreement protocol for the backups.
  • No labels