Child pages
  • Split of functionality between servers and clients
Skip to end of metadata
Go to start of metadata

How do we split the functionality between server and client code?

What are our metrics?

  • Good Design
    • Whether something "should" be on the server or client because of good decomposition
    • Probably not terribly important.
  • Library API
    • Do not want to complicate the API and make the developer write more or trickier code
    • Complicating the library is ok, just not the library API
  • Latency
    • Depends on what we're doing. Might increase.
  • Throughput
    • A net plus (that's why we're doing this)
  • Scalability
    • We only want to add servers if we run out of storage capacity.
  • Security
    • The client cannot be trusted
    • Obviously we still need some authentication code on the client
  • RAM
    • We want servers to reserve as much memory as possible for holding data. If certain tasks use too much memory, then that might not be acceptable.
    • Remember, can't swap data to disk!

Questions

  • Ratio of application servers to storage servers?
  • What will be saturated? CPU? Network? Something else?
    • We want to increase the load of whatever is not being utilized.

Client RAM

  • Does it make sense for client machines to contribute their RAM?
  • How much RAM does a client actually need?
  • What is the cost savings of adding RAM to a client machine compared to buying a new server?
  • Again, what is the bottle neck?

Types of features

  • Accessing the data
    • What level of queries will we allow?
    • Simple operations (get, put) probably nothing interesting
    • But what if you need to do something with the data, such as sort?
    • Basically it comes down to "do we need to do computation on this data"?
    • Are we willing to do trade-offs that increase network traffic?
    • Also depends on whether the data needs to be in the same place (sort) or not (sum)
  • Self-management
    • Code that manages servers (moving data, crash recovery)
    • How many cycles does this actually need? If not that many, then does it even matter?
    • Probably security issues with this
  • Indexing
    • Ask John (smile)
    • Who does the indexing? Client can probe index, or the server can
    • Putting it on the client allows it to be more flexible.
  • Data replication
    • Probably makes more sense on the server (but I am not sure)
  • Map reduce
    • Is this done separately?
    • If not, it's just a question of who's got the spare cycles
  • Locking

John's stuff:

  • RAMCloud is built on the notion of distribution:there will be many storage servers, and there will also be application servers. Not all of the functionality of the system will run on the storage servers.
  • Wherever it can be done efficiently, we should try to offload functionality from the storage servers to the client machines: this will increase the scalability of the storage system.
  • The RAMCloud system software can include a component that runs on the client machines, so we can choose whether to implement pieces of functionality on the client or on the servers.
  • For example, consider an operation that collects data and sorts it:
    • Of the data will almost certainly come from multiple storage servers.
    • It would probably make sense to do the final merge on the client machine, not on a storage server: the RAMCloud client software identifies the desired data, fetches pieces from various servers, and merges them together into the final sorted result.
    • It might not even make sense for the storage servers to do the initial sorts of the data fragments: just collect raw data and return it for sorting on the client machine.
  • What is the right boundary between RAMCloud client software and server software?
  • Things that absolutely must be done on servers:
    • Anything that requires trust, such as access control. Particularly in a multi-tenant environment, the client can't be trusted.
    • Locking (if it potentially impacts other client machines).
  • No labels