Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Large namespace, clients generate unique identifiers (e.g., based on id of creating m achinemachine).
  • Server generates names. For example, with hierarchical names, server assigns record ids consecutively starting at 1.
    • This introduces potential synchronization issues for the server.
    • Consecutive integer assignment can be useful: for example, easy to implement log-like tables where order of insertion is clear. Might also be useful for implementing message queues in tables.

...

  • Each table can have one or more named indexes associated with it.
  • An index maps from a key to one or more object identifiers.
  • An index knows nothing about the actual objects and never touches them; it deals exclusively in keys and object identifiers, which are provided to it.
  • Indexes take two forms:
    • Exact match (based on hash table)
    • Ordered (based on trees, with keys that can be strings, integers, or floating-point numbers)
      • Provide an extension mechanism for custom comparison functions?
  • Operations:
    • addIndexEntry(objectId, index, key)
      • Creates a new entry in an index associated with a particular table.
      • "index" name and index associated with objectId's table.
      • "key" is the value associated with this index entry (string, integer, etc.)
    • findEntries(table, index, key1, key2)
      • Returns object identifiers for all objects in a particular index for a particular table whose keyis in the range between "key1" and "key2".
      • May want additional options to exclude endpoints of range (or, just filter on the client side?).
    • deleteEntry(table, index, key)
  • With this approach, indexing is explicit:
    • The application must explicitly request the creation of an index entry, either
      at the same time that it creates/updates the corresponding object, or in a separate operation.
    • The application must also explicitly request the deletion of an index entry when it believes the corresponding object.
    • The keys used for indexes need not necessarily consist of data fields from the objects in the table, and not every object in a table necessarily must be indexed.
    • The same object can appear multiple times in a single index, under different keys.
  • This approach makes indexes almost completely separate from objects:
    • No need for them to be stored in the same place, for example.
    • But, can't store the objects inline in the index, so an additional RPC will be required to fetch the objects once the index has returned their identifiers.
    • May not be able to Will RAMCloud guarantee consistency between index and table .(see below)?

Other possible approaches to indexing:

...

There are several issues that arise because applications run on different machines from the servers, and because there could be thousands of servers; data for a particular application or even a particular table may spread across multiple servers. This section assumes that object names are application-table-id.

How does a client know which server to ask for an object, given its identifier?

  • The client-side RAMCloud library should be able to cache configuration information for its application, which allows it to map table-id pairs to particular storage servers.
  • Configuration information can be retrieved initially from an overall configuration manager (to be discussed under a different topic).
  • Configuration information changes slowly.
  • When it does change, it is self-validating:
    • If a client's configuration information becomes stale it will send a request to the wrong server.
    • The server will respond responds "this id doesn't belong to me", which is different from "no such id".
    • Upon receiving this response, the client will request updated information from the configuration manager.
    • If a client attempts to talk to a server and gets no response at all, it also contacts the configuration manager: either (a) the configuration has changed, so it's not surprising the old server didn't respond (in this case the client gets new configuration information) or (b) the server has crashed, in which case the configuration manager needs to know so it can initiate recovery (in this case the client also gets new configuration information, for a backup).

...

  • If index changes are logged by the object servers, this can help with crash recovery.
  • But, don't want to reread entire logs to reconstruct an index.
  • Will need some sort of checkpointing mechanism for indexes.

Searching

Not addressed here.