Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

One lesson from Paxos Made Live is that it's useful to have a way to compare replicas' state to verify that they are identical. Doing this periodically helped them discover bugs before they were exposed to applications. Ideally, LogCabin replicas could produce a single checksum value that would cover the entire contents of their state. This state includes: log entries, tombstones, and client responses (for linearizability). These same checksums could also be used to guard against disk corruption, by checking the consistency of servers before they are re-admitted to the cluster.

Problem 1: Independent Cleaning

If the LogCabin servers clean these independently, the set of tombstones and client responses on each replica may be different. In that case, I think only the log entries could be covered in a checksum.

John and Diego discussed this on 2012-03-26. We concluded that checksumming just the live entries (and no tombstones or client responses) would provide most of the benefit, and we shouldn't introduce much more complexity to increase checksum coverage for now.

Problem 2: When checksums don't match

Supposing checksums are verified periodically, what should happen when checksums don't match?

  • No labels