/
Coordinator Refactoring

Coordinator Refactoring

  1. Split the rpc handling functions (and thus, corresponding helper functions) in Coordinator Service into multiple logical groupings.
    1. CoordinatorTabletManager - manages tables, tabletMap
      • createTable
      • dropTable
      • splitTablet
      • getTableId
      • getTabletMap
      • reassignTabletOwnership
    2. CoordinatorServerManager - manages serverList
      • enlistServer
      • getServerList
      • hintServerDown
      • sendServerList
      • setWill
    3. nothing extra (only call handlers)
      • recoveryMasterFinished
      • quiesce
  2. Think about how the functionality corresponding to each function gets spilt amongst different modules.
    1. All the rpc handlers - CoordinatorService
    2. Each handler:
      1. Receives / processes the rpc. - work1
      2. Then calls into the corresponding function residing in a separate module. - work2
  3. We will probably have another module, say CoordinatorServiceRecovery that does work workR
  4. Decide division of work between work1 and work2 (and similarly, between workR and work2). Options:
    1. work1 and workR function as dispatchers, real work done in work2.
      • work1 - processes the rpc and passes the arguments to the appropriate function in work2.
      • workR - iterates over the log, passes each entry to the appropriate function for work2.
      • work2 - if the request is coming from workR, then it first processes the state to get the arguments. In all cases, it does all the real work.
    2. Split according to the the recovery path.
      • work1 - everything upto (and including) the first time a log is written to logcabin. This is the work that will be only ever done by the current leader (not the followers or the recovering leader).
      • workR - iterates over the log, passes each entry to the appropriate function for work2.
      • work2 - if the request is coming from workR, then it first processes the state to get the arguments. It then does everything after work1.
  5. Look at the above options from decision-hiding perspective:
    1. Option a (from above):
      • Knowledge about format of rpcs - work1
      • Decisions wrt how the log is read during recovery - workR (also knows how to read the opcode, but nothing more).
      • Knowledge about format of log entries - work2
      • Decision about when to write to log - work2
      • Decisions wrt implementation of the function (in all the cases) - work2
      • Problem: Putting all the decisions in one module is probably not the answer to good decision-hiding.
      • Tombstone for prev problem: On the other hand, if decisions are related, then they should be in one place.
    2. Option b (from above):
      • Knowledge about format of rpcs - work1
      • Decisions wrt how the log is read during recovery - workR (also knows how to read the opcode, but nothing more).
      • Knowledge about format of log entries - work1 & work2
      • Decision about when to write to log - work1 & work2
      • Decisions wrt implementation of the function (in all the cases) - work1 & work2.
      • Problem: The work seems to be split flow-wise, not major decision wise.
  6. Decision from point 5: We're going with option a from point 4.