Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Create a log that is replicated across a cluster of servers. Each log entry will be identical on each of the servers.
  • The log is sequentially ordered: each entry has a unique integer id, and ids are assigned in ascending order.
  • If a server has accepted a given log entry, then it also has accepted all log entries with smaller ids than the given one.
  • If a majority of the servers have accepted a particular log entry, then that log entry is called accepted guaranteed: it will eventually be accepted by all servers, and it will not be lost unless a majority of the servers suffer simultaneous catastrophic failures that lose all of their persistent data.
  • One of the servers in the cluster is designated the leader; all client requests that modify the log must be processed by the leader.
  • Clients can read log entries from any server, though there will usually be a time delay between when data can be read from the leader and when it can be read from other servers.
  • Leadership can move among the servers in the cluster in response to failures.
  • There is at most one leader in the cluster at a time. Once a new leader has been elected, it will not be possible for the previous leader to make noticeable updates to the log until it has been reelected.

...

Clients append to the log by making a request through the leader. The leader adds the new entry to its log, then sends a request containing that log entry to each of the other servers. Each server appends the entry to its log and also writes the data to durable secondary storage; once this is done, the server is said to have accepted the log entry. Once a majority of the cluster has accepted the new log entry, the leader can respond to the client.  At this point the durability of the new entry is called guaranteed because its durability is assured; the only event that could cause permanent data loss is simultaneous catastrophic failures of more than half the servers in the cluster, causing them to lose their secondary storage.

If a passive server crashes then it will not be able to accept new data from the leader. The leader need not wait for the crashed server to restart before responding to client requests: as long as a majority of the cluster is responsive, the cluster can continue operation. When a server restarts after a crash, it enters passive mode (it does not attempt to contact the leader). If the leader does not receive an acceptance from a server when it sends a new log entry, it continues trying at regular intervals; eventually the server will restart, at which point the leader will "catch it up" on the log entries it has not yet received. This mechanism guarantees insurers that all servers will eventually mirror all log entries, in the absence of leader figures.

Leader failures are more interesting. At the time of a leader failure, there may be one or more log entries that have been partially accepted by the cluster (i.e., the leader has not yet responded to the requesting client). There may also be any number of log entries that have been accepted guaranteed by the cluster, but are not yet fully replicated on all servers. For the partially accepted entries, the new leader must guarantee that these entries are either fully replicated and accepted, or completely expunged from all logs. The entry that is expunged must never be returned to a client in a read operation: the system must behave as if the client never made its original request. For entries that have been accepted guaranteed but not fully replicated, the new leader must guarantee make sure that these entries are eventually fully replicated. 

First, let's handle the case of guaranteed but not fully replicated entries. ALPO makes sure that the new leader is chosen from among those servers that have accepted all of the guaranteed entries. It does this by modifying the notion of rank during leader election. When candidates request votes they include in the request the term for the election, their server id, and the log id of the most recent entry they have accepted. One candidate automatically outranks another if its "last accepted id" is higher than that of the other candidate. If they both have the same "last accepted id", then rank is determined by server id. Furthermore, a server will automatically reject a request for its vote if its "last accepted id" is higher than that of the requesting candidate, even if its vote is still available; when a candidate receives this form of rejection it immediately drops out of the election and returns to passive state.

Since a candidate requires votes from a majority of the cluster to become leader, and since it has accepted all of the log entries that were accepted by any of the other servers that voted for it, and since any guaranteed log entry must have been accepted by a majority of the servers in the cluster, the new leader is certain to store all of the guaranteed log entries. As it communicates with the other servers in the cluster it can update any that are running behind. In particular, when the leader sends a new log entry to a passive server, the passive server will reject the request unless it already stores all of the log entries with smaller ids. When this happens, the passive server indicates to the leader of the highest id that stores, so the leader can then send it all of the missing entries.

The second problem is the case of partially accepted (but not yet guaranteed) log entries. It is likely that the new leader will store these entries, in which case they will get fully replicated a new leader. However, if an entry has only been accepted by a small number of servers that it is possible that a new leader can be elected without any of its vote-givers storing the entry. In this case the new leader must make sure that the entry is expunged by all other servers. The way it does this is by initiating a log append for a new entry that indicates leadership change. The id for this entry will be the next one in order on the leader. When this entry arrives at a passive server, the passive server deletes any existing entries with this id or higher before it accepts the entry. This is the only situation where a passive server receives a log entry whose id it has already accepted.

Leader failures also introduce the potential for zombie leaders. A zombie leader is a leader that has been replaced but does not yet know it. ALPO must make sure that  zombie leaders cannot continue to modify the log. Each request issued by the leader includes the leader's term number. If a passive server receives a request whose term is lower than the server's current term, then it rejects the request. If a leader receives such a request then it knows it has been deposed, and returns to passive state. Before a server gives its vote to a candidate it must increase its term number to that of the new term. This guarantees that

Linearizability for clients

...