Split of functionality between servers and clients

Split of functionality between servers and clients

How do we split the functionality between server and client code?

What are our metrics?

  • Good Design

    • Whether something "should" be on the server or client because of good decomposition

    • Probably not terribly important.

  • Library API

    • Do not want to complicate the API and make the developer write more or trickier code

    • Complicating the library is ok, just not the library API

  • Latency

    • Depends on what we're doing. Might increase.

  • Throughput

    • A net plus (that's why we're doing this)

  • Scalability

    • We only want to add servers if we run out of storage capacity.

  • Security

    • The client cannot be trusted

    • Obviously we still need some authentication code on the client

  • RAM

    • We want servers to reserve as much memory as possible for holding data. If certain tasks use too much memory, then that might not be acceptable.

    • Remember, can't swap data to disk!

Questions

  • Ratio of application servers to storage servers?

  • What will be saturated? CPU? Network? Something else?

    • We want to increase the load of whatever is not being utilized.

Client RAM

  • Does it make sense for client machines to contribute their RAM?

  • How much RAM does a client actually need?

  • What is the cost savings of adding RAM to a client machine compared to buying a new server?

  • Again, what is the bottle neck?

Types of features

  • Accessing the data

    • What level of queries will we allow?

    • Simple operations (get, put) probably nothing interesting

    • But what if you need to do something with the data, such as sort?

    • Basically it comes down to "do we need to do computation on this data"?

    • Are we willing to do trade-offs that increase network traffic?

    • Also depends on whether the data needs to be in the same place (sort) or not (sum)

  • Self-management

    • Code that manages servers (moving data, crash recovery)

    • How many cycles does this actually need? If not that many, then does it even matter?

    • Probably security issues with this

  • Indexing

    • Ask John

    • Who does the indexing? Client can probe index, or the server can

    • Putting it on the client allows it to be more flexible.

  • Data replication

    • Probably makes more sense on the server (but I am not sure)

  • Map reduce

    • Is this done separately?

    • If not, it's just a question of who's got the spare cycles

  • Locking

John's stuff:

  • RAMCloud is built on the notion of distribution:there will be many storage servers, and there will also be application servers. Not all of the functionality of the system will run on the storage servers.

  • Wherever it can be done efficiently, we should try to offload functionality from the storage servers to the client machines: this will increase the scalability of the storage system.

  • The RAMCloud system software can include a component that runs on the client machines, so we can choose whether to implement pieces of functionality on the client or on the servers.

  • For example, consider an operation that collects data and sorts it:

    • Of the data will almost certainly come from multiple storage servers.

    • It would probably make sense to do the final merge on the client machine, not on a storage server: the RAMCloud client software identifies the desired data, fetches pieces from various servers, and merges them together into the final sorted result.

    • It might not even make sense for the storage servers to do the initial sorts of the data fragments: just collect raw data and return it for sorting on the client machine.

  • What is the right boundary between RAMCloud client software and server software?

  • Things that absolutely must be done on servers:

    • Anything that requires trust, such as access control. Particularly in a multi-tenant environment, the client can't be trusted.

    • Locking (if it potentially impacts other client machines).