Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Back-of-the-envelope musings for the bit vector approach

Space overhead:

Overhead for the bit vector on backups is R/(8*MinObjSize) * MasterLogSize.

For example, if R = 4, MinObjSize = 40, and MasterLogSize = 64GB, we have 0.8GB of bits in memory on each backup. This is 1.25% overhead.

While 1.25% seems a little high, consider that no tombstones will be stored in the log. Tombstones are currently 42 bytes each, so the crossover point for a 64GB master is at 20.5e6 tombstones per log. That sounds like a lot, but its still only 1.25%. I don't know how low we could really expect to get in terms of tombstone overhead.

There would also be 8 bytes per segment for the segment version number, but that would only add 256KB for 8MB segments, 64GB servers, R = 4.

Entry offset overhead:

If each entry must describe which bit on replicas needs to be flipped when it dies, how big must the offset be?

With 8MB segments, 40 byte MinObjSize, we'd need to represent 209,716 values. So 18 bits. Having 24 bits would keep us safe up to 640MB segments.

Interestingly, this seems to argue for smaller segments. With 2.5MB segments we'd need 16 bits. 160KB segments would make 12 bits viable. Alternatively, a larger minimum object size would help, but we'd need to double the size to chop each additional bit.