/
Comparing Replicas

Comparing Replicas

Warning: these are design notes from initial stages of LogCabin and are are probably not relevant any longer.

One lesson from Paxos Made Live is that it's useful to have a way to compare replicas' state to verify that they are identical. Doing this periodically helped them discover bugs before they were exposed to applications. Ideally, LogCabin replicas could produce a single checksum value that would cover the entire contents of their state. This state includes: log entries, tombstones, and client responses (for linearizability). These same checksums could also be used to guard against disk corruption, by checking the consistency of servers before they are re-admitted to the cluster.

Problem 1: Independent Cleaning

If the LogCabin servers clean these independently, the set of tombstones and client responses on each replica may be different. In that case, I think only the log entries could be covered in a checksum.

John and Diego discussed this on 2012-03-26. We concluded that checksumming just the live entries (and no tombstones or client responses) would provide most of the benefit, and we shouldn't introduce much more complexity to increase checksum coverage for now.

Problem 2: When checksums don't match

Supposing checksums are verified periodically, what should happen when checksums don't match?

Related content

LogCabin
More like this
Compaction
More like this
Linearizability
Linearizability
More like this
Distributed Systems Reading Group
Distributed Systems Reading Group
More like this
Recovery
More like this
Paper Ideas: June 12 2015
Paper Ideas: June 12 2015
More like this