Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

...

For multi-threaded read/write non-locking access, no failures

(Can directly look at Object Update, since that includes the other cases.)

Simplest solution for all cases, wrap steps into a transaction. But, we'd like to avoid transactions as far as possible (to retain as much performance and scalability as possible).

...

  1. Insert index entry then write object: (tick) -> Object available (via all indexes) as soon as object written.
    1. Read before either insertion / write: no inconsistency
    2. Read after both insertion and write: no inconsistency
    3. Read between insertion and write: index entry found, matching object not found, read will return no value, as if object wasn't written yet. Same thing with multiple indexes.
  2. Write object then insert index entry:
    1. Read before either write / insertion: no inconsistency
    2. Read after both write and insertion: no inconsistency
    3. Read between write and insertion: index entry not found, read will return no value, as if write hadn't completed. This is inconsistent since object does exist. If we have multiple indexes, then lookup on one could succeed and lookup on another could fail.

Object delete: Options:

  1. Remove index entry then delete object
    1. Violates property 1.
    2. Example of undesirable situation: If I have index A and index B on same data, then removing entry from index A then from B then actually deleting data could cause a situation where lookup on index A doesn't return value, but lookup on index B returns a value.
  2. Delete object then remove entry: (tick) -> Object not available (via any index) as soon as object deleted.
    1. Read on this object before deletion and removal: object read, no inconsistency
    2. Read after deletion and removal: object not found, no inconsistency
    3. Read between deletion and removal: index entry found, corresponding object not found, no object returned.
    4. Note: delete() operation doesn't have information about indexes. Hence, before / while deleting object, read in the index keys so that they can be removed. How to handle failures?

Object update:

From the consistency requirement and RAMCloud operation linearizability, it follows that we want the following linearizability property for object updates:

...

  1. Insert new index entries
  2. Read old object (to get old index entry information) and write new object (this invalidates the old object) – Can return to client at this point
    • Note: Don't need to read out. Can keep track of position in log and look it up later.
  3. Remove old index entries – Can happen asynchronously after return to client; Can batch from various requests.
    • Note: Will have to mark old object in the some way to make sure it doesn't get deleted.
Step 2 is a single operation (new kind of write) where writing an obj returns the old one if it exists. This will have to be atomic. (A previous option achieve the same result with current reads and writes, no new atomic write/read. But that needed more rpcs, and more complex code for write/write access.)

TODO: What happens if step 2 done before step 1?

...

Previous state: Foo: Bob Brown. Client 1 writes: Foo: Sam Smith. Now, say client 2 writes: Foo: Jen Jones.

Allowed end states: Consistent data and index entries:

Data - Foo: Sam Smith; fname - Sam: 444444; lname - Smith: 444444 ---OR--- vice versa

Not allowed end states: Data X but index entries corresponding to Y:

Data - Foo: Sam Smith; fname - Jen: 444444; lname - Jones: 444444 ---OR--- vice versa

Not ideal end states: Consistent data and index entries plus dangling pointers:

Data - Foo: Sam Smith; fname -  Sam: 444444, Jen: 444444; lname - Smith: 444444, Jones: 444444 ---OR--- vice versa


It seems like there is no interleaving of actions leading to not allowed state.

Potentially bad interleaving produces ideal result:

  1. Master writes index entries for SS
  2. Master writes index entries for JJ
  3. Master reads BB and writes data SS (this invalidates BB)
  4. Master reads SS and writes data JJ (this invalidates SS)
  5. Master removes index entries for BB
  6. Master removes index entries for SS
 
Complication: What if client 2 wants to write BB (same as the original data)?

...

Need to prevent delayed repeated messages.
Can return early to client, but not release the lock. Given this, can also batch removes with next write.
(TODO)
 

Failure of an Index Server

The client request being processed gets blocked till the index server is recovered; completed and return to client once index server recovers.

In the meanwhile, two things can happen:

  1. The client might retry this request
    • The write (including index updates) is idempotent by design, so not a problem.
  2. Another client might send another request for the same object
    • This is the same as multi-threaded access to the same object problem.

If an update involves this index server, return RETRY to client. Can leave dangling pointers around. Failure (and returning retry) at any point will not violate required consistency properties.

No need to return RETRY if failure happens at removal of index entries stage.

Implementation-wise: Use RPC wrapper to throw error if index server failed.

For a reasonable first implementation: Just wait for index server to recover, no need to return RETRY.

 

...

Dangling pointers problem: Options:
  1. Just leave them around. How much do we expect to accumulate anyway?
  2. Remove dangling pointers if a lookup is done on that and object not found. (Will have to be smart to not remove a pointer for which object is being currently added. Can do so by giving ownership of this removal to the object master.)
  3. Cleaning! (See next subsection.)

 

...