Split of functionality between servers and clients
How do we split the functionality between server and client code?
What are our metrics?
Good Design
Whether something "should" be on the server or client because of good decomposition
Probably not terribly important.
Library API
Do not want to complicate the API and make the developer write more or trickier code
Complicating the library is ok, just not the library API
Latency
Depends on what we're doing. Might increase.
Throughput
A net plus (that's why we're doing this)
Scalability
We only want to add servers if we run out of storage capacity.
Security
The client cannot be trusted
Obviously we still need some authentication code on the client
RAM
We want servers to reserve as much memory as possible for holding data. If certain tasks use too much memory, then that might not be acceptable.
Remember, can't swap data to disk!
Questions
Ratio of application servers to storage servers?
What will be saturated? CPU? Network? Something else?
We want to increase the load of whatever is not being utilized.
Client RAM
Does it make sense for client machines to contribute their RAM?
How much RAM does a client actually need?
What is the cost savings of adding RAM to a client machine compared to buying a new server?
Again, what is the bottle neck?
Types of features
Accessing the data
What level of queries will we allow?
Simple operations (get, put) probably nothing interesting
But what if you need to do something with the data, such as sort?
Basically it comes down to "do we need to do computation on this data"?
Are we willing to do trade-offs that increase network traffic?
Also depends on whether the data needs to be in the same place (sort) or not (sum)
Self-management
Code that manages servers (moving data, crash recovery)
How many cycles does this actually need? If not that many, then does it even matter?
Probably security issues with this
Indexing
Ask John
Who does the indexing? Client can probe index, or the server can
Putting it on the client allows it to be more flexible.
Data replication
Probably makes more sense on the server (but I am not sure)
Map reduce
Is this done separately?
If not, it's just a question of who's got the spare cycles
Locking
John's stuff:
RAMCloud is built on the notion of distribution:there will be many storage servers, and there will also be application servers. Not all of the functionality of the system will run on the storage servers.
Wherever it can be done efficiently, we should try to offload functionality from the storage servers to the client machines: this will increase the scalability of the storage system.
The RAMCloud system software can include a component that runs on the client machines, so we can choose whether to implement pieces of functionality on the client or on the servers.
For example, consider an operation that collects data and sorts it:
Of the data will almost certainly come from multiple storage servers.
It would probably make sense to do the final merge on the client machine, not on a storage server: the RAMCloud client software identifies the desired data, fetches pieces from various servers, and merges them together into the final sorted result.
It might not even make sense for the storage servers to do the initial sorts of the data fragments: just collect raw data and return it for sorting on the client machine.
What is the right boundary between RAMCloud client software and server software?
Things that absolutely must be done on servers:
Anything that requires trust, such as access control. Particularly in a multi-tenant environment, the client can't be trusted.
Locking (if it potentially impacts other client machines).