Ashish's Notes

1. Indexing Survey

2. Memory Allocation

We need to allocate to b+ tree nodes in the master server. B+ tree is represented by B+ tree nodes linked with pointers. A general structure of a B+ tree node is as follows:

struct node {
int key;
int primary_key; // only for leaf nodes
node* left_child;
node* right_child;
}

We can use malloc() to allocate a memory for a node. However, we need to make sure memory cleaning is efficient. A better way is to use the existing RamCloud memory management to allocate memory for nodes which is also responsible for cleaning. This gives an added advantage of simple recovery mechanism in case of index server failures. The current RamCloud memory management handles independent memory allocations and does not handle pointers. We need to make sure that during cleaning, the node pointers are consistent i.e when a child is moved to a new location, the parent points to updated child location. There are two approach to handle this:

1) Parent Pointer: Having a parent pointer will enable us to reference the parent and change the child pointers during cleaning. However, this will have a space overhead and will require

2) Hash Table: Another approach is to create a level of indirection between the parent's child pointers and child nodes i.e instead of accessing child nodes in single step through pointers, we will go through a hash table to find the corresponding child location. This leads to easier cleaning mechanism where we need to only update the hash table corresponding to a node, when a node is moved. We can use the existing hash table in the master server to store the index node information. The goal is to create a pseudo-table 'Index Table' which will not be accessible to the client sides, but will be handled by the sever. The table will be handled by RamCloud memory management like an usual table, which is responsible for cleaning and recovery.The trade-off is the general tree traversal operations will now be expensive, since each child access will have to go through the hash table. The computations will increase by atmost 2x (1us -> 2us). The new structure of a B+ tree node will be

struct node {
int key;
int primary_key; // only for leaf node
int left_child_key; // index table key
int right_child_key; // index table key
}

Issues

Maintaining Index table during splits
Storing the root node key (for global access)

3. Indexing Schema Management

Table Manager:
- The new table manager needs to store the index information.
- Logical representation of index is a tree.
- Internally represented as an ordered list of siblings where each sibling contains StartKeyHash.
- Index Types: signed int8, signed int32, signed int64, string, float
- comparison function: implied by types

Object Finder
- tableId + primaryKeyHash -> tableServer
- tableId + indexId + indexKey -> IndexServer
- master: to find index partitions, and to find its own data

RPCs for schema management from client to coordinator
- createIndexRPC( tableId, indexType, indexId, serverSpan)
  - serverSpan=0 -> to be placed on the same server as the table
- dropIndexRPC( tableId, indexId)

RPCs for schema management from coordinator to master
- takeTabletOwnership()
- dropTabletOwnership()
- takeIndexOwnership()
- dropIndexOwnership()
- notifyIndexChange(): invoked from create index rpc or during re-partitioning

RPCs for schema information from coordinator to client (for lookups)
- getTabletMap()
- getTabletConfig()