Version Numbers
Objects in RAMCloud have version numbers to support atomic operations.
Note that only the latest version of an object is ever used — RAMCloud does not store all previous versions of an object.
- We definitely want to guarantee that every distinct blob ever at a particular object ID will have a distinct version number, even across generations. (Ask Ryan for his lock example.)
- We probably also want to guarantee that version numbers for a particular object ID monotonically increase over time. This allows us to efficiently compare two version numbers and tell which one is more recent.
- We might also want to guarantee that the version number of an object increases by exactly one when it is updated. This allows clients to accurately predict the version numbers that they will write. (Ask Diego for his client-side transactions example.)
- Example - Conditional Write)
- We can tell an object is NOT modified when its version number is the same.
- Once a object with version number V is removed.
- When a new object with same key is created, if its version number is bigger than V, we can tell the object is modified. However, if its version number is equal or smaller than V, somebody may erroneously detect the object is unmodified.
The current code guarantees (1) and (2) in the following way:
- There's a "master vector clock" named as safeVersion per master which contains the next available version number on that master. This is initialized to a small integer and is recoverable after crashes (see Recovery).
- When an object is created, its new version number is set to the value of safeVersion, and the safeVersion is incremented. (Do we really need to increment safeVersion at its reference? --> No. RAM-680)
- When an object is deleted, if its version number is bigger than or equal to safeVersion, safeVersion is set to the version number plus one.
- Nothing happens to safeVersion at object update because as far as the object is alive the updated object refers the version number in the updating object.
- Unsigned 64bit counter is big enough because 2^64 ~= 1.84*10^19. If we increment the counter every 1 ns (= 10^-9 sec), it takes 1.84 * 10^10 sec to overflow. It is 213,504 day or 585 year and long enough.
--- Below is Obsolete .. --
The current code guarantees (1) and (2) in the following way:
- There's a "master vector clock" per table per master which contains the next available version number for that table on that master. This is initialized to a small integer when the table is created and is recoverable after crashes (see Recovery).
- When an object is created or updated, its new version number is set to the value of the master vector clock, and the master vector clock is incremented.
- Nothing special happens when an object is deleted.
A slightly different implementation could guarantee (1), (2), and (3):
- Keep the master vector clock, initialize and recover it the same.
- When an object is created, its new version number is set to the value of the master vector clock, and the master vector clock is incremented.
- When an object is updated, its new version number is set the old blob's version number plus one.
- When an object is deleted, set the master vector clock to max(master vector clock, the deleted blob's version number plus one).
As of 2010-02-09, we have not decided whether we want to guarantee (3).