...
- A client reserves sequence numbers for RPC ids. It reserves M+1 consecutive ids, where M is the number of objects involved in the current transaction. The lowest seq# is not assigned to any object or RPC and work as placeholder. Other M sequence numbers are assigned to each object.
- RPC (1) PREPARE: A client sends prepare messages to all data master servers participating transaction. For understandability, we send a separate RPC request for each object in transaction.
- Request msg: <list of <tableId, keyHash, Seq#>, tableId, key, condition, newVal>
- list of <tableId, keyHash, Seq#>: used in case of client disconnection.
- TableId & Key: object operating on.
- Condition: condition for COMMIT-VOTE other than successful locking. RAMCloud RejectRules. This can be NULL.
- newVal: value to be written for “key” on the receipt of “COMMIT”.
- Handling:
- Grab a lock for “key” on lock table. Buffer newVal for the key.
- - If the lock was grabbed & condition is satisfied, log LockRecord (lock information. See figure~\ref{fig:lockRecord}) and RpcRecord with the result of "COMMIT-VOTE" and <list of <tableId, keyHash, Seq#>> (linearizability. See figure~\ref{fig:rpcRecord})
- If grabbed the lock but condition is not satisfied, unlock immediately, and log RpcRecord with the result of “ABORT-VOTE” and <list of <tableId, keyHash, Seq#>>
- If we failed to grab the lock, log RpcRecord with the result of “ABORT-VOTE” and <list of <tableId, keyHash, Seq#>>.
(JO: why do we need to log anything here? The abort condition is permanent, no? A: retried PREPARE can successfully grab a lock. I suspect this can cause client sees "ABORT" but recovery process can "COMMIT".)
(JO: Ahah, I see now: I was thinking of the case where the condition is not satisfied. This condition is permanent (any retries will also fail), so we only need to log ABORT-VOTE if we couldn't grab the lock, right? A: Well. It depends on details of the protocol. Currently, we unlock immediately if condition didn't match. Subsequent TX can change the object. The retried PREPARE can now vote for COMMITE, leaving "COMMIT-VOTE" in linearizability record. Same problem can happen.)
(JO: if the condition wasn't satisfied initially, how could an object modification result in the condition being satisfied in the future?) - Sync log with backup.
- JO: I think that the server needs to record the <list of <tableId, KeyHash, Seq#>> as well; this needs to be durable, no? A: Yes, it is recorded with linearizability record in response field of RpcRecord.
- Response: either “COMMIT-VOTE” or “ABORT-VOTE”.
- Request msg: <list of <tableId, keyHash, Seq#>, tableId, key, condition, newVal>
- RPC(3) DECISION: After collecting all votes from data masters, the client broadcast the decision to cohorts voted for COMMIT.
- Request: <tableId, keyHash, seq# for PREPARE, DECISION>
- Handling: if DECISION = COMMIT,
- If there is a buffered write, log Object (with new value), Tombstone for old Object, and Tombstone for LockRecord atomically.
- Unlock the object in lock table.
- Sync log with backup.
(It is not okay to delay sync until we sync a next transaction’s LockRecord.)
- Response: ACK.
- After collecting “ACK” from all cohorts, the client acknowledge the lowest seq# reserved, so that ACK# can reach up to the highest seq# used in this transaction.
...