Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are several issues that arise because applications run on different machines from the servers, and because there could be thousands of servers; data for a particular application or even a particular table may spread across multiple servers. This section assumes that object names are application-table-id

How does the a client know which server to ask for an object, given its identifier?

  • The client-side RAMCloud library should be able to cache configuration information for its application, which allows it to map table-id pairs to particular storage servers.
  • Configuration information can be retrieved initially from an overall configuration manager (to be discussed under a different topic).
  • Configuration information changes slowly.
  • When it does change, it is self-validating:
    • If a client's configuration information becomes stale it will send a request to the wrong server.
    • The server will respond "this id doesn't belong to me", which is different from "no such id".
    • Upon receiving this response, the client will request updated information from the configuration manager.
    • If a client attempts to talk to a server and gets no response at all, it also contacts the configuration manager: either (a) the configuration has changed, so it's not surprising the old server didn't respond (in this case the client gets new configuration information) or (b) the server has crashed, in which case the configuration manager needs to know so it can initiate recovery (in this case the client also gets new configuration information, for a backup).

What if an indexed lookup refers to objects on multiple servers?

  • The initial index lookup returns some combination of ids and objects, depending on whether any or all of the objects are stored on the same server(s) that contain(s) the index.
  • The client-side library initiates additional server requests for the ids.
  • The multi-step retrieval should be transparent to the actual application.

What if an index is split across multiple servers?

  • The client-side library should be smart enough to figure out which servers it needs to contact.
    • For a range-based index, there will be enough configuration information to identify the range of keys stored on each server, so the client-side library can pick relevant server(s).
    • For an exact-match index the configuration information will include enough information about the hashing function for the client-side library to compute the hash and map it to a particular server. In fact, perhaps servers do not even have to know what the hash function is: incoming requests include a hash value plus the original key.
  • It should be possible to use the same approach to configuration consistency here as for name-based lookups.

Miscellaneous Notes