Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. A client reserves sequence numbers for RPC ids. It reserves M+1 consecutive ids, where M is the number of objects involved in the current transaction. The lowest seq# is not assigned to any object or RPC and work as placeholder. Other M sequence numbers are assigned to each object.
  2. RPC (1) PREPARE: A client sends prepare messages to all data master servers participating transaction. For understandability, we send a separate RPC request for each object in transaction.
    1. Request msg: <list of <tableId, keyHash, Seq#>, tableId, key, condition, newVal>
      1. list of <tableId, keyHash, Seq#>: used in case of client disconnection.
      2. TableId & Key: object operating on.
      3. Condition: condition for COMMIT-VOTE other than successful locking. RAMCloud RejectRules. This can be NULL.
      4. newVal: value to be written for “key” on the receipt of “COMMIT”.
    2. Handling:
      1. Grab a lock for “key” on lock table. Buffer newVal for the key.
      2. - If the lock was grabbed & condition is satisfied, log LockRecord (lock information. See figure~\ref{fig:lockRecord}) and RpcRecord (linearizability. See figure~\ref{fig:rpcRecord})
        - If grabbed the lock but condition is not satisfied, unlock immediately, and log RpcRecord with the result of “ABORT-VOTE”
        - If we failed to grab the lock, log RpcRecord with the result of “ABORT-VOTE”.
      3. Sync log with backup.
    3. Response: either “COMMIT-VOTE” or “ABORT-VOTE”.
  3. RPC(3) DECISION: After collecting all votes from data masters, the client broadcast its decision to all cohorts.
    1. Request: <tableId, keyHash, seq# for PREPARE, DECISION>
    2. Handling: if DECISION = COMMIT,
      1. If there is a buffered write, log Object (with new value), Tombstone for old Object, and Tombstone for LockRecord atomically.
      2. Unlock the object in lock table.
      3. Sync log with backup…? Is it safe on master crash?backup.
        (It is not okay to delay sync until we sync a next transaction’s LockRecord. We only need a guarantee that only one LockRecord exists per object.)
    3. Response: Done.
  4. After collecting “Done” from all cohorts, the client acknowledge the lowest seq# reserved, so that ACK# can reach up to the highest seq# used in this transaction.

...

  1. RPC(5): as a DM detects the expiration of a client lease, it checks whether there is unacknowledged transaction information, and sends “StartCleanup” request to recovery coordinator of each transaction (the server with 1st entry in list of keyHash).
    1. Request: <clientId, list of <tableId, keyHash, rpcId>>
    2. Handling: recovery coordinator initiates cleanup protocol. Possible optimization: use UnackedRpcResults to avoid duplicate cleanups/recoveries.
    3. Response: Empty
  2. RPC(6): Recovery coordinator sends requestAbort to clean up & release all locks in masters.
    1. Request: <clientId, seq#>
    2. Handling:
      1. checkDuplicate with given clientID & seq#
      2. if exists, respond with saved results.
      3. If not, respond “ABORT-VOTE”
    3. Response: COMMIT-VOTE | ABORT-VOTE
  3. Check if COMMITED set has this TX’s record. After recovery coordinator collects all votes, durably log outcome of TX (only if outcome is COMMIT) & add to COMMITED set and send decision & order clean up.
    1. Request: <DECISION, clientId, rpcId in RPC(6)>
    2. Handling:
      1. Check a lock is grabbed for rpcId
      2. If a lock was grabbed, flush the buffered write (detail is same as normal operation.) and unlock the object.
      3. Clean up RpcRecord by manually marking “acked” on UnackedRpcResults. Refactoring UnackedRpcResults is required to support marking “acked” and shrinking its window accordingly. We delete the whole client information as soon as all TX are marked as “acked”.
      4. Respond ACK.
    3. Response: ACK (empty)
  4. Recovery coordinator deletes the logged result (written in 7) of transaction (appending tombstone for the TX outcome entry). It is now safe to remove the TX’s record from COMMITED set.

...