Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • At the beginning of recovery the coordinator contacts every server in the cluster. This is already necessary for the proper operation of recovery; it means that once recovery has begun every server in the cluster knows that a master has been condemned.
  • If the condemned master is not really dead and then it will continue to send PING requests to other servers every 10ms.
  • When another server receives a PING from a condemned server it responds with a special "Dude, you're a zombie!" response. If a server receives such a response it places itself in a limbo state, which means it will not service any requests from clients until it has contacted the coordinator. This is analogous to expiration of a lease.
  • When a limbo server contacts the coordinator, the coordinator can respond in one of two ways:
    • It can tell the condemned master that it no longer owns any tablets.
    • It can pardon the condemned master, tell it to continue normal operation, and abort any recovery that was in progress for the master. This is analogous to renewing the master's lease. In this case the master removes itself from limbo state and continues business as usual.
  • If a server fails to receive a response from a PING request, it must contact the coordinator as described above so that the coordinator can decide whether the PING target should be recovered. But, in addition, whenever a PING times out a server the PING initiator must also place itself into limbo state until it has contacted the coordinator. This is necessary to handle partitions where a zombie loses communication with the rest of the cluster. In most cases, the server will respond by saying "continue normally" as described above.
  • Finally, if a server receives a PING request when it is in limbo state it indicates that fact in its response, and the server sending the PING request initiator must put itself in limbo and contact the coordinator. This ensures that, in the event of a partition, information about the partition spreads rapidly throughout a disconnected group of servers (see below for an example).

...