Split of functionality between servers and clients
How do we split the functionality between server and client code?
What are our metrics?
- Good Design
- Whether something "should" be on the server or client because of good decomposition
- Probably not terribly important.
- Library API
- Do not want to complicate the API and make the developer write more or trickier code
- Complicating the library is ok, just not the library API
- Latency
- Depends on what we're doing. Might increase.
- Throughput
- A net plus (that's why we're doing this)
- Scalability
- We only want to add servers if we run out of storage capacity.
- Security
- The client cannot be trusted
- Obviously we still need some authentication code on the client
- RAM
- We want servers to reserve as much memory as possible for holding data. If certain tasks use too much memory, then that might not be acceptable.
- Remember, can't swap data to disk!
Questions
- Ratio of application servers to storage servers?
- What will be saturated? CPU? Network? Something else?
- We want to increase the load of whatever is not being utilized.
Client RAM
- Does it make sense for client machines to contribute their RAM?
- How much RAM does a client actually need?
- What is the cost savings of adding RAM to a client machine compared to buying a new server?
- Again, what is the bottle neck?
Types of features
- Accessing the data
- What level of queries will we allow?
- Simple operations (get, put) probably nothing interesting
- But what if you need to do something with the data, such as sort?
- Basically it comes down to "do we need to do computation on this data"?
- Are we willing to do trade-offs that increase network traffic?
- Also depends on whether the data needs to be in the same place (sort) or not (sum)
- Self-management
- Code that manages servers (moving data, crash recovery)
- How many cycles does this actually need? If not that many, then does it even matter?
- Probably security issues with this
- Indexing
- Ask John
- Who does the indexing? Client can probe index, or the server can
- Putting it on the client allows it to be more flexible.
- Data replication
- Probably makes more sense on the server (but I am not sure)
- Map reduce
- Is this done separately?
- If not, it's just a question of who's got the spare cycles
- Locking
John's stuff:
- RAMCloud is built on the notion of distribution:there will be many storage servers, and there will also be application servers. Not all of the functionality of the system will run on the storage servers.
- Wherever it can be done efficiently, we should try to offload functionality from the storage servers to the client machines: this will increase the scalability of the storage system.
- The RAMCloud system software can include a component that runs on the client machines, so we can choose whether to implement pieces of functionality on the client or on the servers.
- For example, consider an operation that collects data and sorts it:
- Of the data will almost certainly come from multiple storage servers.
- It would probably make sense to do the final merge on the client machine, not on a storage server: the RAMCloud client software identifies the desired data, fetches pieces from various servers, and merges them together into the final sorted result.
- It might not even make sense for the storage servers to do the initial sorts of the data fragments: just collect raw data and return it for sorting on the client machine.
- What is the right boundary between RAMCloud client software and server software?
- Things that absolutely must be done on servers:
- Anything that requires trust, such as access control. Particularly in a multi-tenant environment, the client can't be trusted.
- Locking (if it potentially impacts other client machines).