These are Diego's meeting notes for 2011-03-30.

Goals of a Least Usable System

The ideal least usable system would be the smallest system that would:

Option 1: Cache

memcached protocol compatibility
- Implement their RPCs
  - Need operations like increment, prepend
  - They use 250-byte keys
- Implement their TCP protocol, which should be very similar to that of TcpTransport.
  - Unfortunately, it appears to takes deep knowledge of the RPCs to figure out the number of bytes in the request.
- Implement their simple UDP protocol. This should use the existing Driver interface and would be like a very stripped down version of FastTransport.
- See http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
Disable the coordinator.
- Memcached libraries handle all necessary cluster management, so the coordinator is not necessary.
- Masters should not enlist with the coordinator. They should instead start with exactly one table.
Build a userspace 10GigE Driver for a strategically chosen NIC.
Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
Update the log cleaner (garbage collector).
- We'll need an eviction policy for when masters fill up (memcached uses LRU).
  - One cheap and reasonable strategy might be to free the oldest segment in its entirety.

Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
Build a userspace 10GigE Driver for a strategically chosen NIC.
Update the log cleaner.
Enable the failure detector.
Handle coordinator failures.
Handle backup failures.
Handle multiple master failures and other secondary failures.
Cold boot.
- Backups need a superblock.
Make FastTransport fast.
Mechanism for splitting and moving tablets.
Threading?
Batteries for backup buffers?
- Use SSDs instead?
- Accept some data loss?
Network partitions?
Administration and diagnosis tools
- Table enumeration?

(This target would meet a lesser definition of "usable", probably only usable here at Stanford)

Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
Update the log cleaner.
Handle coordinator failures.
Handle backup failures.
Handle multiple master failures and other secondary failures.
Cold boot.
- Backups need a superblock.
Threading

Tasks deferred until later:

User-space 10GigE driver (just use Infiniband)
Enable failure detector (failure detection comes from clients)
Make FastTransport fast (just use Infiniband)
Mechanism for splitting and moving tablets
Non-volatile log buffers (allow data loss during datacenter-wide power failures)
Network partitions
Administration and diagnosis tools (implement only things that we desperately need, as they are discovered)
- Table enumeration?