Table of Contents
...
For multi-threaded read/write non-locking access, no failures
(Can directly look at Object Update, since that includes the other cases.)
Simplest solution for all cases, wrap steps into a transaction. But, we'd like to avoid transactions as far as possible (to retain as much performance and scalability as possible).
...
- Insert index entry then write object: -> Object available (via all indexes) as soon as object written.
- Read before either insertion / write: no inconsistency
- Read after both insertion and write: no inconsistency
- Read between insertion and write: index entry found, matching object not found, read will return no value, as if object wasn't written yet. Same thing with multiple indexes.
- Write object then insert index entry:
- Read before either write / insertion: no inconsistency
- Read after both write and insertion: no inconsistency
- Read between write and insertion: index entry not found, read will return no value, as if write hadn't completed. This is inconsistent since object does exist. If we have multiple indexes, then lookup on one could succeed and lookup on another could fail.
Object delete: Options:
- Remove index entry then delete object
- Violates property 1.
- Example of undesirable situation: If I have index A and index B on same data, then removing entry from index A then from B then actually deleting data could cause a situation where lookup on index A doesn't return value, but lookup on index B returns a value.
- Delete object then remove entry: -> Object not available (via any index) as soon as object deleted.
- Read on this object before deletion and removal: object read, no inconsistency
- Read after deletion and removal: object not found, no inconsistency
- Read between deletion and removal: index entry found, corresponding object not found, no object returned.
- Note: delete() operation doesn't have information about indexes. Hence, before / while deleting object, read in the index keys so that they can be removed. How to handle failures?
Object update:
From the consistency requirement and RAMCloud operation linearizability, it follows that we want the following linearizability property for object updates:
...
- Insert new index entries
- Read old object (to get old index entry information) and write new object (this invalidates the old object) – Can return to client at this point
- Note: Don't need to read out. Can keep track of position in log and look it up later.
- Remove old index entries – Can happen asynchronously after return to client; Can batch from various requests.
- Note: Will have to mark old object in the some way to make sure it doesn't get deleted.
...
Previous state: Foo: Bob Brown. Client 1 writes: Foo: Sam Smith. Now, say client 2 writes: Foo: Jen Jones.
Allowed end states: Consistent data and index entries:
Data - Foo: Sam Smith; fname - Sam: 444444; lname - Smith: 444444 ---OR--- vice versa
Not allowed end states: Data X but index entries corresponding to Y:
Data - Foo: Sam Smith; fname - Jen: 444444; lname - Jones: 444444 ---OR--- vice versa
Not ideal end states: Consistent data and index entries plus dangling pointers:
Data - Foo: Sam Smith; fname - Sam: 444444, Jen: 444444; lname - Smith: 444444, Jones: 444444 ---OR--- vice versa
It seems like there is no interleaving of actions leading to not allowed state.
Potentially bad interleaving produces ideal result:
- Master writes index entries for SS
- Master writes index entries for JJ
- Master reads BB and writes data SS (this invalidates BB)
- Master reads SS and writes data JJ (this invalidates SS)
- Master removes index entries for BB
- Master removes index entries for SS
...
Failure of an Index Server
The client request being processed gets blocked till the index server is recovered; completed and return to client once index server recovers.
In the meanwhile, two things can happen:
- The client might retry this request
- The write (including index updates) is idempotent by design, so not a problem.
- Another client might send another request for the same object
- This is the same as multi-threaded access to the same object problem.
If an update involves this index server, return RETRY to client. Can leave dangling pointers around. Failure (and returning retry) at any point will not violate required consistency properties.
No need to return RETRY if failure happens at removal of index entries stage.
Implementation-wise: Use RPC wrapper to throw error if index server failed.
For a reasonable first implementation: Just wait for index server to recover, no need to return RETRY.
...
- Just leave them around. How much do we expect to accumulate anyway?
- Remove dangling pointers if a lookup is done on that and object not found. (Will have to be smart to not remove a pointer for which object is being currently added. Can do so by giving ownership of this removal to the object master.)
- Cleaning! (See next subsection.)
...