Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Since we are purging the durable records about votes in this process, we cannot utilize the distributed durable information about votes to recover from crash. Thus, we are switching to regular 2 phase commit and relying on a single durable record about transaction outcome; After collection of votes, recovery coordinator durably record its decision. Effectively, we only need to log COMMIT decisions since we assume abort if no linearizability record exist for RequestAbort step. We keep the commit decisions in memory as well, called COMMITED set, for faster access. Without this record, the crash of recovery coordinator after partially sending DECISION in step 4 can cause inconsistency. Such crash can change a transaction originally decided to commit to abort since the RequestAbort message returns ABORT-VOTE after garbage collection in step 4. (JO: I still don't understand this; perhaps the best thing is to discuss in person. Why can't garbage collection be separated entirely from aborting/completing transactions, and thereby made much simpler? For example, use locks on objects to force transaction completion/abort, and make this independent of the mechanism for garbage-collecting unacked RPC results?) (JO: also, if the COMMITTED set is needed here, isn't it also needed in the Client Crash Recovery section above?)

  1. RPC(5): as a DM detects the expiration of a client lease, it checks whether there is unacknowledged transaction information, and sends “StartCleanup” request to recovery coordinator of each transaction (the server with 1st entry in list of keyHash). (JO: what is "unacknowledged transaction information??".  In addition, why is this step necessary? If the transaction completed, then on lease expiration the DM can just discard its unacked RPC results, like it would for any other linearizable operation that completed. If the transaction didn't complete, then the timer mechanism for locks will already have triggered long before the lease expired, no?)
    1. Request: <clientId, list of <tableId, keyHash, rpcId>>  (+ also clusterTime).
    2. Handling: recovery coordinator initiates cleanup protocol. Possible optimization: use UnackedRpcResults to avoid duplicate cleanups/recoveries.
    3. Response: Empty
  2. Check if COMMITED set has this TX’s record. If it was decided to commit before, skip step 3 and send COMMIT message in step 4.
    (JO: I'm still confused (my earlier comment on this seems to have gotten deleted without answering the questions). Exactly what is the COMMITTED set? Perhaps explain this above when you first mention the COMMITTED set? Why is this needed? Is there a problem if step 3 gets executed multiple times? A: answered in 2nd paragraph of this section.)
  3. RPC(6): Recovery coordinator sends requestAbort to clean up & release all locks in masters.
    1. Request: <clientId, seq#>
    2. Handling:
      1. checkDuplicate with given clientID & seq#
      2. if exists, respond with saved results.
      3. If not, respond “ABORT-VOTE” (JO: durable? A: we don't except retired PREPARE and we will reject the PREPARE anyway, so it is safe without durable logging here.)
    3. Response: COMMIT-VOTE | ABORT-VOTE
  4. After recovery coordinator collects all votes, durably log outcome of TX (only if outcome is COMMIT) & add to COMMITED set and send decision & order clean up.
    1. Request: <DECISION, clientId, rpcId in RPC(6)>
    2. Handling:
      1. Check a lock is grabbed for rpcId
      2. If a lock was grabbed, flush the buffered write (detail is same as normal operation.) and unlock the object.
      3. Clean up RpcRecord by manually marking “acked” on UnackedRpcResults. Refactoring UnackedRpcResults is required to support marking “acked” and shrinking its window accordingly. We delete the whole client information as soon as all TX are marked as “acked”.
      4. Respond ACK.
    3. Response: ACK (empty)
  5. Recovery coordinator deletes the logged result (written in 7) of transaction (appending tombstone for the TX outcome entry). It is now safe to remove the TX’s record from COMMITED set.