Split of functionality between servers and clients

How do we split the functionality between server and client code?

What are our metrics?

  • Good Design
    • Whether something "should" be on the server or client because of good decomposition
    • Probably not terribly important.
  • Library API
    • Do not want to complicate the API and make the developer write more or trickier code
    • Complicating the library is ok, just not the library API
  • Latency
    • Depends on what we're doing. Might increase.
  • Throughput
    • A net plus (that's why we're doing this)
  • Scalability
    • We only want to add servers if we run out of storage capacity.
  • Security
    • The client cannot be trusted
    • Obviously we still need some authentication code on the client
  • RAM
    • We want servers to reserve as much memory as possible for holding data. If certain tasks use too much memory, then that might not be acceptable.
    • Remember, can't swap data to disk!

Questions

  • Ratio of application servers to storage servers?
  • What will be saturated? CPU? Network? Something else?
    • We want to increase the load of whatever is not being utilized.

Client RAM

  • Does it make sense for client machines to contribute their RAM?
  • How much RAM does a client actually need?
  • What is the cost savings of adding RAM to a client machine compared to buying a new server?
  • Again, what is the bottle neck?

Types of features

  • Accessing the data
    • What level of queries will we allow?
    • Simple operations (get, put) probably nothing interesting
    • But what if you need to do something with the data, such as sort?
    • Basically it comes down to "do we need to do computation on this data"?
    • Are we willing to do trade-offs that increase network traffic?
    • Also depends on whether the data needs to be in the same place (sort) or not (sum)
  • Self-management
    • Code that manages servers (moving data, crash recovery)
    • How many cycles does this actually need? If not that many, then does it even matter?
    • Probably security issues with this
  • Indexing
    • Ask John (smile)
    • Who does the indexing? Client can probe index, or the server can
    • Putting it on the client allows it to be more flexible.
  • Data replication
    • Probably makes more sense on the server (but I am not sure)
  • Map reduce
    • Is this done separately?
    • If not, it's just a question of who's got the spare cycles
  • Locking

John's stuff:

  • RAMCloud is built on the notion of distribution:there will be many storage servers, and there will also be application servers. Not all of the functionality of the system will run on the storage servers.
  • Wherever it can be done efficiently, we should try to offload functionality from the storage servers to the client machines: this will increase the scalability of the storage system.
  • The RAMCloud system software can include a component that runs on the client machines, so we can choose whether to implement pieces of functionality on the client or on the servers.
  • For example, consider an operation that collects data and sorts it:
    • Of the data will almost certainly come from multiple storage servers.
    • It would probably make sense to do the final merge on the client machine, not on a storage server: the RAMCloud client software identifies the desired data, fetches pieces from various servers, and merges them together into the final sorted result.
    • It might not even make sense for the storage servers to do the initial sorts of the data fragments: just collect raw data and return it for sorting on the client machine.
  • What is the right boundary between RAMCloud client software and server software?
  • Things that absolutely must be done on servers:
    • Anything that requires trust, such as access control. Particularly in a multi-tenant environment, the client can't be trusted.
    • Locking (if it potentially impacts other client machines).