These are Diego's meeting notes for 2011-03-30.
Goals of a Least Usable System
The ideal least usable system would be the smallest system that would:
- Provide significant value for some set of users
- Get us feedback from users
- Be as close as possible to the path towards the final system
Option 1: Cache
- memcached protocol compatibility
- Implement their RPCs
- Need operations like increment, prepend
- They use 250-byte keys
- Implement their TCP protocol, which should be very similar to that of TcpTransport.
- Unfortunately, it appears to takes deep knowledge of the RPCs to figure out the number of bytes in the request.
- Implement their simple UDP protocol. This should use the existing Driver interface and would be like a very stripped down version of FastTransport.
- See http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
- Implement their RPCs
- Disable the coordinator.
- Memcached libraries handle all necessary cluster management, so the coordinator is not necessary.
- Masters should not enlist with the coordinator. They should instead start with exactly one table.
- Build a userspace 10GigE Driver for a strategically chosen NIC.
- Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- Update the log cleaner (garbage collector).
- We'll need an eviction policy for when masters fill up (memcached uses LRU).
- One cheap and reasonable strategy might be to free the oldest segment in its entirety.
- We'll need an eviction policy for when masters fill up (memcached uses LRU).
Option 2: Key-Value Store
- Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
- Build a userspace 10GigE Driver for a strategically chosen NIC.
- Update the log cleaner.
- Enable the failure detector.
- Handle coordinator failures.
- Handle backup failures.
- Handle multiple master failures and other secondary failures.
- Cold boot.
- Backups need a superblock.
- Make FastTransport fast.
- Mechanism for splitting and moving tablets.
- Threading?
- Batteries for backup buffers?
- Use SSDs instead?
- Accept some data loss?
- Network partitions?
- Administration and diagnosis tools
- Table enumeration?