Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Single global identifier: large flat namespace with all objects for all applications in the same namespace.
    • Looks simple and clean.
    • Too unstructured; leaves too many problems to be solved by higher-level software, doesn't provide enough hooks for management.
    • For example, need to be able to delete all data associated with an application.
    • Need to associate access control information with every object.
    • Result: system will have to create additional structures for this extra information; why not just design those structures in from the beginning?
    • Lookups may be tricky: when an application starts up, how does it locate its own data? Certain identifiers reserved for special purposes?
    • Are there any advantages to this approach?
  • Hierarchical name, such as (application id) + (table id) + (record id).
    • Provides natural places to store metadata.
    • Can reserve application id 0 for system information, table id 0 in each application for overall application information, etc.
    • What is the right number of levels?

How are names assigned?

  • Large namespace, clients generate unique identifiers (e.g., based on id of creating m achine).
  • Server generates names. For example, with hierarchical names, server assigns record ids consecutively starting at 1.
    • This introduces potential synchronization issues for the server.
    • Consecutive integer assignment can be useful: for example, easy to implement log-like tables where order of insertion is clear. Might also be useful for implementing message queues in tables.

Indexing

One possibility: no indexing provided by RAMCloud

  • RAMCloud provides only name-based lookups?
  • Implement indexing as a service or library on top of RAMCloud.
  • RAMCloud provides only name-based lookups?
  • However, virtually every application will need some kind of indexing; probably better to build it into RAMCloud.
  • Also, RAMCloud will need indexing itself (e.g., find the application named "Facebook").
  • Indexing may be expensive to implement outside RAMCloud:
    • Multiple RPCs to traverse an index tree to find particular objects.
    • Consistency: maintaining index as tables are modified.

Suppose RAMCloud implements indexing; a minimal approach is to separate the management of the indexes from the generation of index termskeys:

  • Each table can have one or more named indexes associated with it.
  • An index maps from a key to one or more object identifiers.
  • An index knows nothing about the actual objects and never touches them; it deals exclusively in keys and object identifiers, which are provided to it.
  • Indexes take two forms:
    • Exact match (based on hash table)
    • Ordered (based on trees, with keys that can be strings, integers, or floating-point numbers)
      • Provide an extension mechanism for custom comparison functions?
    RAMCloud makes no association between index terms and fields in an object; application does this.
  • Operations:
    • addIndexEntry(objectId, index, termkey)
      • Creates a new entry in an index associated with a particular table.
      • "index" name and index associated with objectId's table.
      • "termkey" is the value associated with this index entry (string, integer, etc.)
    • findEntries(table, index, key1, termkey2)
      • Returns object identifiers for all objects in a particular index for a particular table whose term matches "term".
    • findEntries(table, index, term1, term)
      • Returns object identifiers for all objects in a particular index for a particular table whose term lies between "term1" and "term2".

...

      • keyis in the range between "key1" and "key2".
      • May want additional options to exclude endpoints of range (or, just filter on the client side?).
    • deleteEntry(table, index, key)
  • With this approach, indexing is explicit:
    • The application must explicitly request the creation of an index entry, either
      at the same time that it creates/updates the corresponding object, or in a separate operation.
    • The application must also explicitly request the deletion of an index entry when it believes the corresponding object.
    • The keys used for indexes need not necessarily consist of data fields from the objects in the table, and not every object in a table necessarily must be indexed.
  • This approach makes indexes almost completely separate from objects:
    • No need for them to be stored in the same place, for example.
    • But, can't store the objects inline in the index, so an additional RPC will be required to fetch the objects once the index has returned their identifiers.
    • May not be able to guarantee consistency between index and table.

Other possible approaches to indexing:

  • Traditional SQL approach:
    • Indexes are defined in terms of fields stored in a table.
    • RAMCloud automatically maintains indexes once defined.
    • RAMCloud must parse objects in a table in order to extract fields for indexing.
    • This may be more transparent than the approach above; on the other hand, a client-level library may be able to manage indexes just as transparently as this, but using the approach above.
    • Requires the server to parse objects, which seems undesirable.
  • Same as "minimal" approach, but allow a "primary" index for table, with the objects guaranteed to be co-located on the same server as the index. The index would provide a form that returns objects as well as identifiers.

Distributed System Issues

Miscellaneous Notes

  • Probably needs to be customizable to meet needs of different applications. For example, perhaps the application computes the value(s) on which to index particular items, and RAMCloud simply implements the low-level index lookup.
  • Indexing should be much easier for RAMCloud than for a disk-based database: no need to reorganize the data to match the index.