Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Another potential issue is writes received by a zombie after recovery starts, but before the zombie goes into limbo state (this is also a problem an issue with the "traditional lease" approach). For example, suppose recovery starts but it takes a while for zombie to enter limbo state. It might receive write requests, which it accepts. In the worst case, these write requests could arrive after recovery masters have read the backup data for the segment, in which case the writes will be lost. One solution is for backups to refuse to accept data for a condemned master, returning a "Dude, you're a zombie" response in the same way as for PINGs. This would guarantee that, once recovery has progressed past the initial phase with the coordinator contacts every server, no more writes will be processed by the condemned master. But, what if the zombie and some or all of the backups are in a separate partition? If the coordinator is able to contact any of the backups during recovery, then that will prevent the zombie from writing new data. If all of the backups are in the same partition as the zombie then the zombie will continue to accept write requests unaware of the partition. However, in this case the coordinator will not be able to recover since none of the replicas will be available, so recovery will be deferred until at least one of the replicas becomes available, at which point writes will no longer be possible.

...