Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a dump of discussions on both high level and implementation issues, mainly to help me keep track of them.

...

This is the detailed version of createTable algo:


First try:

The following logging model could work:

...

Assume the coordinator fails, and on recovery, only the first log entry is seen. This means that the previous coordinator could have talked to either 0, or 1, or 2, or all 3 of the masters. So, to ensure that the operation is successfully completed, the new coordinator will talk to all three again and then log done.

Problem(s) with this model:

During recovery there can be cases when some operations that were originally started (the first entry corresponding to that operation written in log) can't be completed ("done").

Consider the failure scenario described in previous section. At this point, assume that the number of masters available is 0. There may be masters in the cluster that are trying to enlist, but we can't service those requests right now due to the restriction that we have first complete replay then service requests.

Undo when can't redo:

In such a scenario, it might be useful to just undo (or, clean abort) the operation that can't be completed. In our example, it would mean asking each master to drop ownership of the corresponding tablet, if it already had the ownership.

Undo can help proceed with recovery when certain operations like createTable can't be completed. Note that for other operations like dropTable, it is easier to always complete the action rather than undo. Can there be a simpler solution? (Yes, see below.)

Get master recovery to do your work for you: Changing the definition of commit point:

We change the above model of logging state to be:

...

Concretely, this is how the model for createTable would change:

 

Anchor
multi-nodes
multi-nodes
3. Multiple Coordinator Nodes:

Introduction:

As mentioned in the background on the top, the coordinator consists of small group of nodes. The following properties should hold:

...