Least Usable System

These are Diego's meeting notes for 2011-03-30.

Goals of a Least Usable System

The ideal least usable system would be the smallest system that would:

  • Provide significant value for some set of users
  • Get us feedback from users
  • Be as close as possible to the path towards the final system

Option 1: Cache

  • memcached protocol compatibility
    • Implement their RPCs
      • Need operations like increment, prepend
      • They use 250-byte keys
    • Implement their TCP protocol, which should be very similar to that of TcpTransport.
      • Unfortunately, it appears to takes deep knowledge of the RPCs to figure out the number of bytes in the request.
    • Implement their simple UDP protocol. This should use the existing Driver interface and would be like a very stripped down version of FastTransport.
    • See http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
  • Disable the coordinator.
    • Memcached libraries handle all necessary cluster management, so the coordinator is not necessary.
    • Masters should not enlist with the coordinator. They should instead start with exactly one table.
  • Build a userspace 10GigE Driver for a strategically chosen NIC.
  • Make RPCs more robust.
    • Hosts should be able to recover from the other end of a session failing.
  • Update the log cleaner (garbage collector).
    • We'll need an eviction policy for when masters fill up (memcached uses LRU).
      • One cheap and reasonable strategy might be to free the oldest segment in its entirety.

Option 2: Key-Value Store

  • Make RPCs more robust.
    • Hosts should be able to recover from the other end of a session failing.
    • RPCs need to time out eventually (this should be reasonably aggressive).
  • Build a userspace 10GigE Driver for a strategically chosen NIC.
  • Update the log cleaner.
  • Enable the failure detector.
  • Handle coordinator failures.
  • Handle backup failures.
  • Handle multiple master failures and other secondary failures.
  • Cold boot.
    • Backups need a superblock.
  • Make FastTransport fast.
  • Mechanism for splitting and moving tablets.
  • Threading?
  • Batteries for backup buffers?
    • Use SSDs instead?
    • Accept some data loss?
  • Network partitions?
  • Administration and diagnosis tools
    • Table enumeration?

Option 3: Stripped-Down Key-Value Store

(This target would meet a lesser definition of "usable", probably only usable here at Stanford)

  • Make RPCs more robust.
    • Hosts should be able to recover from the other end of a session failing.
    • RPCs need to time out eventually (this should be reasonably aggressive).
  • Update the log cleaner.
  • Handle coordinator failures.
  • Handle backup failures.
  • Handle multiple master failures and other secondary failures.
  • Cold start.
    • Backups need a superblock.
  • Threading
  • Overall reliability model: the system can handle simple failures with no data loss, and can survive anything, but more complex failures (such as total power failure) will cause data loss. At any point if we get confused about what to do (e.g. network partition), we can just shut the whole system down and do a cold start, with potential data loss.

Tasks deferred until later:

  • User-space 10GigE driver (just use Infiniband)
  • Enable failure detector (failure detection comes from clients)
  • Make FastTransport fast (just use Infiniband)
  • Mechanism for splitting and moving tablets
  • Non-volatile log buffers (allow data loss during datacenter-wide power failures)
  • Network partitions
  • Administration and diagnosis tools (implement only things that we desperately need, as they are discovered)
    • Table enumeration?