Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
(John's comments are marked in red)
Aim: Split indexlet I (backed by backing table T) located on M1 into I1 (backing table remains T) and I2 (new backing table T2) and migrate I2 to M2.


Coordinator TableManager splitAndMigrateIndexlet:
Create table T2 on M2.
Call prepForIndexletMigration on M2 for I2.
Call splitAndMigrateIndexlet on M1 for I and get nextNodeId.
In index->indexlets: Add I1 belong Truncate I to I1 (belonging to M1) and add I2 belonging to M2 and remove I. (This would also get persisted to external storage when we implement coord fault tolerance for indexes.)
Call takeIndexletOwnership on M2 for I2 with nextNodeId.

...

In indexletManager list of indexlets, truncateIndexlet I to of I1.
Send the last log segment.
(Or, we could send the entire log, truncate, check log if it has grown and if it has then migrate that segment.)
Return nextNodeId. Recompute on the receiving side as you receive segments? See recovery.


M2 MasterService prepForIndexletMigration:

...

b: Since we can’t tell exactly what data was transferred, the solution would be the same as a. When M2 receives the same segment (from a different server), it will do the right thing (discard new if same, keep new if it has newer version num in which case the older version num object will get cleaned up by log cleaner at some point). We’d need a mechanism on M2 to clean up old data (received from M1 before crash) when it gets a new receiveMigrationData for the same indexlet but from a different server. (can the data transfer be made idempotent, so it doesn't matter if some records get transferred muiltiple times?)
c, d: Since the coord cannot distinguish between c/d and b, the solution would be the same as b even though that results in wasted work! (wasted work is fine, as long as it doesn't happen very often and is safe)

...

a: Client retries
b to l except df: If we log at b (after computing all local info for creating T2) and at l, then we can simply re-do all of these operations if we implement these operations to be idempotent. (what happens for d?)d:  (TODO: make sure all of these operations are monotonic!)
f: Same as before. It is okay to transfer multiple times, we already have this in case of master crashes. Its probably bad to have multiple ongoing split&M requests from M1 to M2 (why? do you mean multiple requests for the same indexlet?)! Will linearizability take care of this issue? (I'm not sure; we should discuss)
m: No problem

(What about crashes of M2 ?)M2 failures:
Before prepForIndexletMigration: Choose a different M3 and go from there. (Note that we'll have to have to make the client provided server id to be a hint.) Note that T2 will get recovered on a different server but I2 will not. Client retries
During one of the receiveMigrationData requests: M1 aborts and return to the coordinator. What does tabletMigration do?
Before takeIndexletOwnership: I think recovery could have taken care of it, except it won't have access to nextNodeId. So maybe we need to do the same as previous case?
After takeIndexletOwnership: Recovery will take care of it. Need to call trimAndBalance on recovery.
NOTE: What if i want to migrate the entire indexlet?