Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This class implements a standard (n, m) B+ tree that maps variable
* length secondary key blobs to primary key hashes for retrieval in
* RAMCloud index operations. The property (n, m) means the smallest
* non-root node in the system can only have n entries and the largest
* node in the system can be up to n/m entries at most. In our case,
* n = mininnerslot = 4 and m = innerslotmax = 8. This B+ does caolesce and
* rebalance nodes to maintain this invariant.
*
* As with most B+ trees, there are leaf nodes which contain n keys/data
* and inner nodes that contain the n keys and n + 1 pointers to other
* nodes. However, unlike most trees, the key and data are one and the same
* in this case since it is desired to sort by not only the secondary key
* but also the pKHash number. Thus, the inner nodes in this B+ also
* contain data as well the keys. This key+data combination is called
* a BtreeEntry.
*
* The B+ tree is organized such that the root is always in a fixed location
* after a recovery. This is achieved by storing the B+ nodes as RAMCloud
* objects so that the nodes are automatically recovered during a crash
* and the root node is always stored with the same RAMCloud object key
* (ROOT_ID) so it is possible to continue using the B+ tree after recovery.
* One important note is that the nextNodeId variable is recovered elsewhere
* in the system and it is CRUCIAL to set properly after a recovery. The
* reason is because the system-generated RAMCloud Object Keys for the Nodes
* depends on that variable. If set incorrectly, new nodes could overwrite
* live data.
*
* One more property worth mentioning is that the entries stored in inner
* nodes slot refer to the rightmost key of the subtree where the subtree
* root is the child pointer at that position. This allows the B+ tree to
* store n entries for n + 1 child pointers as the last entry in the B+
* tree can be used as a pivot point to choose between the final child or
* the second last child.
*
* The properties of the B+ tree is summarized below
* - Keys in the system are (Secondary Key Blob) to (Primary Key Hash)
* mappings called Entry's
* - Smallest non-Root node contains at least mininnerslots entries
* - Largest node contains at most innerslotmax entries
* - Nodes are allocated as RAMCloud objects
* - Keys are sorted within a nodes
* - Inner Node Keys refer to the rightmost leaf key when traversing
* the subtree NodeId pointer.

 

Variable-Sized Nodes

/**
* IndexBtree::Node implements  the base class for the inner and leaf nodes of the B+ tree.
* The main responsibility of the base class is to abstract the complexity
* of storing variable-sized blobs in a RAMCloud Buffer and provide a
* vector-like API with set/insert/get/erase operations. The semantics of
* the data and how to arrange the items within the logical array are
* determined by the IndexBtree class.
*
* Nodes use Buffers to keep track of variable-sized secondary keys, which
* is also convenient since RAMCloud Objects are serialized to Buffers and
* read back with Buffers. Due to this structure, it's not uncommon
* for the Node metadata to be stored in the Buffer along with its secondary
* keys.
*
* Using Buffers also affords the use of virtual copies (externalAppend's),
* which are used copiously to update the node. For example, when an
* insert occurs, instead of shifting data, the node performs the following
* operations (a) virtual copy the first half of the original keys,
* (b) make a hard copy of the new inserted key, and (c) virtual copy the
* second half of the original keys.
*
* So a simple node structure after an insert may look like this:
* ------------------------------------------------------------------------
* | Node A (meta) | keya, keyb | virtual keya, hard keyb, virtual keyc |..
* ------------------------------------------------------------------------
*
* A simple erase (on c) may look something like this
* ------------------------------------------------------------------------
* | Node A (meta) | keya, keyb, keyc | virtual keya, virtual keyb | .....
* ------------------------------------------------------------------------
*
* The important thing to note is that the Node will ALWAYS append a copy
* (either virtual if it's part of the old keys or hard if it's passed in
* during an insert) of the keys to the end of the Buffer. This provides
* two useful properties. The first is that the contents of any pointer
* (such as a secondary key pointer) returned by the node during its
* lifetime will remain valid during its lifetime, even if the
* entry has been 'erased'. Here lifetime means from the time that the
* local copy is created/read from RAMCloud to the time that the local
* copy is garbage collected. The second useful property is this allows
* multiple nodes to be use the same buffer, saving on stack space. So it
* is also not uncommon to see a buffer layout such as
*
* ------------------------------------------------------------------------
* | Node A(meta)| keysA v1 | Node B(meta)| keysB v1 | keysA v2 | Node C...
* ------------------------------------------------------------------------
*
* The always-appending-a-copy-of-keys-to-the-end property ensures pointer
* safety within the B+ tree code and allows multiple nodes to share one
* keyBuffer without using additional space.
*/

 

 

 

 

  • No labels