Page Comparison

...

Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
Update the log cleaner.
Handle coordinator failures.
Handle backup failures.
Handle multiple master failures and other secondary failures.
Cold bootstart.
- Backups need a superblock.
Threading
Overall reliability model: the system can handle simple failures with no data loss, and can survive anything, but more complex failures (such as total power failure) will cause data loss. At any point if we get confused about what to do (e.g. network partition), we can just shut the whole system down and do a cold start, with potential data loss.

Tasks deferred until later:

...

Versions Compared