Page Comparison

...

A client reserves sequence numbers for RPC ids. It reserves M+1 consecutive ids, where M is the number of objects involved in the current transaction. The lowest seq# is not assigned to any object or RPC and work as placeholder. Other M sequence numbers are assigned to each object.
RPC (1) PREPARE: A client sends prepare messages to all data master servers participating transaction. For understandability, we send a separate RPC request for each object in transaction.
1. Request msg: <list of <tableId, keyHash, Seq#>, tableId, key, condition, newVal>
  1. list of <tableId, keyHash, Seq#>: used in case of client disconnection.
  2. TableId & Key: object operating on.
  3. Condition: condition for COMMIT-VOTE other than successful locking. RAMCloud RejectRules. This can be NULL.
  4. newVal: value to be written for “key” on the receipt of “COMMIT”.
2. Handling:
  1. Grab a lock for “key” on lock table. Buffer newVal for the key.
  2. - If the lock was grabbed & condition is satisfied, log LockRecord (lock information. See figure~\ref{fig:lockRecord}) and RpcRecord with the result of "COMMIT-VOTE" and <list of <tableId, keyHash, Seq#>> (linearizability. See figure~\ref{fig:rpcRecord})
    - If grabbed the lock but condition is not satisfied, unlock immediately, and log RpcRecord with the result of “ABORT-VOTE” and <list of <tableId, keyHash, Seq#>>
    - If we failed to grab the lock, log RpcRecord with the result of “ABORT-VOTE” and <list of <tableId, keyHash, Seq#>>.
    (JO: why do we need to log anything here? The abort condition is permanent, no? A: retried PREPARE can successfully grab a lock. I suspect this can cause client sees "ABORT" but recovery process can "COMMIT".)
  3. Sync log with backup.
  4. JO: I think that the server needs to record the <list of <tableId, KeyHash, Seq#>> as well; this needs to be durable, no? A: Yes, it is recorded with linearizability record in response field of RpcRecord.
3. Response: either “COMMIT-VOTE” or “ABORT-VOTE”.
RPC(3) DECISION: After collecting all votes from data masters, the client broadcast its the decision to all cohorts . (JO: no need to broadcast to servers that voted ABORT?)voted for COMMIT.
1. Request: <tableId, keyHash, seq# for PREPARE, DECISION>
2. Handling: if DECISION = COMMIT,
  1. If there is a buffered write, log Object (with new value), Tombstone for old Object, and Tombstone for LockRecord atomically.
  2. Unlock the object in lock table.
  3. Sync log with backup.
    (It is not okay to delay sync until we sync a next transaction’s LockRecord.)
3. Response: DoneACK.
After collecting “Done” “ACK” from all cohorts, the client acknowledge the lowest seq# reserved, so that ACK# can reach up to the highest seq# used in this transaction.

...

RPC(5): as a DM detects the crash of client (or slowness of client) by WorkerTimer of lock, sends “StartRecovery” request to recovery coordinator (the server with 1^st entry in list of keyHash).
1. Request: <clientId, list of <tableId, keyHash, rpcId>>
2. Handling: recovery coordinator initiates recovery protocol. Possible optimization: use UnackedRpcResults to avoid duplicate recoveries. CAUTION: avoid deadlock by recovery job occupies all threads in a master.
3. Response: Empty
RPC(6): Recovery coordinator sends requestAbort to clean up & release all locks in masters.
1. Request: <clientId, seq#>
2. Handling:
  1. checkDuplicate with given clientID & seq#
  2. if exists, respond with saved results.
  3. If not, respond “ABORT-VOTE”
3. Response: COMMIT-VOTE | ABORT-VOTE
After recovery coordinator collects all votes, it sends decision to cohorts voted for COMMIT.
1. Request: <DECISION, clientId, rpcId in RPC(6)>
2. Handling:
  1. Check a lock is grabbed for rpcId (2 methods. Need discussion: 1^st soln is saving “key” in RpcRecord::response and use the key to look up lock table. 2^nd soln is keeping a separate table or list of all locks.) (JO: just allow locks to be looked up by rpcid? This is unique. Or, just scan the lock table for the rpcid; this won't happen very often. A: depends on the implementation of lock table. If the lock table is a separate table, we can just enumerate on it. If the lock information is kept as a part of object hash table, I think it is not feasible to enumerate whole hash table. Collin is thinking about lock table implementation.)
  2. If no lock is grabbed, respond with “ACK”
  3. If a lock was grabbed, flush the buffered write (detail is same as normal operation.) and unlock the object.
3. Response: ACK (empty)
Recovery coordinator is finished with transaction. Leaving RpcRecord around is safe for client’s resurrection before lease timeout.

...

Versions Compared

Old Version 9

New Version 10

Key