Problem:
Regular RPC in ramcloud is not linearizable and may result in inconsistent behavior. For example, a client request deletion of a key, and master fails after processing it recorded tombstone on backup) and before responding back to the client. The client will think the RPC was lost and retry the same deletion, which will cause master to reply with an error or deleting newly written value which was written after the original delete RPC. The same problem exists for conditional write. If the master fails between succeeding the conditional write and responding back to client the result, current Ramcloud RPC protocol thinks the request was not delivered to client and retries the conditional write. In that case, recovered master already contains the new version of value (after the conditional write) and the retried request will be rejected since the version number doesn't match. Here, the correct response value should be success since the conditional write RPC was already done before the crash.
We resolve this problem by avoiding re-doing same RPC if the previous one was committed to log in backup. This is done by keeping the status of rpcs in masters.
Overview of the solution.
In a client,
In a master server,
When a crash happens
Log cleaner in master
1. Idea. How to avoid duplicate processing.
Duplicate processing of an RPC (usually due to re-tried RPCs) is avoided by
assigning a unique id for each RPC from a client. A master service keeps the
RPC's id number and its accompanying result, and just reply to duplicate RPCs
with the previously saved results.
To reduce space required to keep such data, a client "acknowledges" its
receipt of RPC results and guarantees it will not re-try the same RPCs.
This is done by attaching an "acknowledgement number" (aka. ack id) to each
RPC request. The number tells RPCs whose ids are smaller than or equal to
the ack id are acknowledged by this client.
Missing: Logging on disk?
2. Mechanisms on Master
2.1 Memory data structure.
Each master keeps a copy of #UnackedRpcResults object to store the results of
linearizable RPCs. As a master receives an RPC request, it will check whether
the same RPC is in progress or completed by checkDuplicate(). As the processing
of the RPC is finished, master records its completion on memory by
recordCompletion(). On backup storage, it atomically writes both the result of
RPC and the log of the rpc's completion.
client_id | processed_list | |||
---|---|---|---|---|
1 |
| |||
2 |
| |||
3 |
|
Figure 1. Processed table: Each master keeps processed_lists for each client. The processed_list tracks the processed rpc_id (only not yet ACKed) by keeping sequence start and end.
... | cond_write <TableID, KeyHash> | ... | Tombstrone <table_id, keyHash> <Client_id, rpc_id, ack_id> | ... |
Figure 2. Log structure with RPC id and ACK id.