Naming and Indexing

This page addresses issues related to naming and indexing objects in RAMCloud. In other words, what are the various mechanisms an application can use to retrieve objects stored in RAMCloud? The choices fall into three general classes:

Naming: each object has a unique identifying name, such as a 128-bit global identifier, or a multi-part name such as (application identifier) + (table identifier) + (primary key). The name-based lookup returns either zero objects or one object.
Indexing: the system may provide separate structures that allow objects to be located quickly based on their contents, e.g. "find all student records for students who have grade point averages greater than 3.0". An indexed lookup can return any number of objects.
Search: return a collection of objects whose contents match a given set of criteria, which can be simple or general. The difference between search and indexing is not very crisp, but search is likely to be more flexible in the criteria that can be specified, but it may require scanning every element (for some definition of "every"), whereas indexing typically implies a table that makes lookups fast.

Naming

Assuming each object has a unique name, what form might that name take and what advantages and disadvantages are associated with that form?

Single global identifier: large flat namespace with all objects for all applications in the same namespace.
- Looks simple and clean.
- Too unstructured; leaves too many problems to be solved by higher-level software, doesn't provide enough hooks for management.
- For example, need to be able to delete all data associated with an application.
- Need to associate access control information with every object.
- Result: system will have to create additional structures for this extra information; why not just design those structures in from the beginning?
- Lookups may be tricky: when an application starts up, how does it locate its own data? Certain identifiers reserved for special purposes?
- Are there any advantages to this approach?
Hierarchical name, such as (application id) + (table id) + (record id).
- Provides natural places to store metadata.
- Can reserve application id 0 for system information, table id 0 in each application for overall application information, etc.
- What is the right number of levels?

How are names assigned?

Large namespace, clients generate unique identifiers (e.g., based on id of creating machine).
Server generates names. For example, with hierarchical names, server assigns record ids consecutively starting at 1.
- This introduces potential synchronization issues for the server.
- Consecutive integer assignment can be useful: for example, easy to implement log-like tables where order of insertion is clear. Might also be useful for implementing message queues in tables.

Indexing

One possibility: no indexing provided by RAMCloud

RAMCloud provides only name-based lookups?
Implement indexing as a service or library on top of RAMCloud.
However, virtually every application will need some kind of indexing; probably better to build it into RAMCloud.
Also, RAMCloud will need indexing itself (e.g., find the application named "Facebook").
Indexing may be expensive to implement outside RAMCloud:
- Multiple RPCs to traverse an index tree to find particular objects.
- Consistency: maintaining index as tables are modified.

Suppose RAMCloud implements indexing; a minimal approach is to separate the management of the indexes from the generation of index keys:

Each table can have one or more named indexes associated with it.
An index maps from a key to one or more object identifiers.
An index knows nothing about the actual objects and never touches them; it deals exclusively in keys and object identifiers, which are provided to it.
Indexes take two forms:
- Exact match (based on hash table)
- Ordered (based on trees, with keys that can be strings, integers, or floating-point numbers)
  - Provide an extension mechanism for custom comparison functions?
Operations:
- addIndexEntry(objectId, index, key)
  - Creates a new entry in an index associated with a particular table.
  - "index" name and index associated with objectId's table.
  - "key" is the value associated with this index entry (string, integer, etc.)
- findEntries(table, index, key1, key2)
  - Returns object identifiers for all objects in a particular index for a particular table whose keyis in the range between "key1" and "key2".
  - May want additional options to exclude endpoints of range (or, just filter on the client side?).
- deleteEntry(table, index, key)
With this approach, indexing is explicit:
- The application must explicitly request the creation of an index entry, either
  at the same time that it creates/updates the corresponding object, or in a separate operation.
- The application must also explicitly request the deletion of an index entry when it believes the corresponding object.
- The keys used for indexes need not necessarily consist of data fields from the objects in the table, and not every object in a table necessarily must be indexed.
- The same object can appear multiple times in a single index, under different keys.
This approach makes indexes almost completely separate from objects:
- No need for them to be stored in the same place, for example.
- But, can't store the objects inline in the index, so an additional RPC will be required to fetch the objects once the index has returned their identifiers.
- Will RAMCloud guarantee consistency between index and table (see below)?

Other possible approaches to indexing:

Traditional SQL approach:
- Indexes are defined in terms of fields stored in a table.
- RAMCloud automatically maintains indexes once defined.
- RAMCloud must parse objects in a table in order to extract fields for indexing.
- This may be more transparent than the approach above; on the other hand, a client-level library may be able to manage indexes just as transparently as this, but using the approach above.
- Requires the server to parse objects, which seems undesirable.
Same as "minimal" approach, but allow a "primary" index for table, with the objects guaranteed to be co-located on the same server as the index.
- The index would provide a form that returns objects as well as identifiers.
- No need for clustered indexes, where the objects are stored as part of the index: since everything is in RAM, prisoner with a server can retrieve the object extremely quickly once it knows its identifier.
- If our RPCs are fast enough, do we need to worry about this optimization?

Distributed System Issues

There are several issues that arise because applications run on different machines from the servers, and because there could be thousands of servers; data for a particular application or even a particular table may spread across multiple servers. This section assumes that object names are application-table-id.

How does a client know which server to ask for an object, given its identifier?

The client-side RAMCloud library should be able to cache configuration information for its application, which allows it to map table-id pairs to particular storage servers.
Configuration information can be retrieved initially from an overall configuration manager (to be discussed under a different topic).
Configuration information changes slowly.
When it does change, it is self-validating:
- If a client's configuration information becomes stale it will send a request to the wrong server.
- The server responds "this id doesn't belong to me", which is different from "no such id".
- Upon receiving this response, the client will request updated information from the configuration manager.
- If a client attempts to talk to a server and gets no response at all, it also contacts the configuration manager: either (a) the configuration has changed, so it's not surprising the old server didn't respond (in this case the client gets new configuration information) or (b) the server has crashed, in which case the configuration manager needs to know so it can initiate recovery (in this case the client also gets new configuration information, for a backup).

What if an indexed lookup refers to objects on multiple servers?

The initial index lookup returns some combination of ids and objects, depending on whether any or all of the objects are stored on the same server(s) that contain(s) the index.
The client-side library initiates additional server requests for the ids.
The multi-step retrieval should be transparent to the actual application.

What if an index is split across multiple servers?

The client-side library should be smart enough to figure out which servers it needs to contact.
- For a range-based index, there will be enough configuration information to identify the range of keys stored on each server, so the client-side library can pick relevant server(s).
- For an exact-match index the configuration information will include enough information about the hashing function for the client-side library to compute the hash and map it to a particular server. In fact, perhaps servers do not even have to know what the hash function is: incoming requests include a hash value plus the original key.
It should be possible to use the same approach to configuration consistency here as for name-based lookups.

Other issues related to indexes that are split across multiple servers:

Is this a good idea? Would it be better to keep an index entirely on a single machine if possible?
How to decide when to change the server assignments for an index?
How to make these changes while the system is running?

What about consistency between objects and their indexes?

Approach #1: no help from RAMCloud: the client-side library first creates an object, then creates indexes for it. Same for updates and deletions.
- Hard to maintain consistency (e.g., the client might crash between creating the object and creating its index entries).
Approach #2: servers manage consistency:
- During a create or update or delete, the client can provide key values and index names.
- The server automatically updates the index(es) along with the object.
- The server logs this information along with other data about the operation, so it will be recovered properly after a crash.

What about crash recovery for indexes?

If index changes are logged by the object servers, this can help with crash recovery.
But, don't want to reread entire logs to reconstruct an index.
Will need some sort of checkpointing mechanism for indexes.

Searching

Not addressed here.