Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Make RPCs more robust.
    • Hosts should be able to recover from the other end of a session failing.
    • RPCs need to time out eventually (this should be reasonably aggressive).
  • Build a userspace 10GigE Driver for a strategically chosen NIC.
  • Update the log cleaner.
  • Enable the failure detector.
  • Handle coordinator failures.
  • Handle backup failures.
  • Handle multiple master failures and other secondary failures.
  • Cold boot.
    • Backups need a superblock.
  • Make FastTransport fast.
  • Mechanism for splitting and moving tablets.
  • Threading?
  • Batteries for backup buffers?
    • Use SSDs instead?
    • Accept some data loss?
  • Network partitions?
  • Administration and diagnosis tools
    • Table enumeration?

Option 3: Stripped-Down Key-Value Store

(This target would meet a lesser definition of "usable", probably only usable here at Stanford)

  • Make RPCs more robust.
    • Hosts should be able to recover from the other end of a session failing.
    • RPCs need to time out eventually (this should be reasonably aggressive).
  • Update the log cleaner.
  • Handle coordinator failures.
  • Handle backup failures.
  • Handle multiple master failures and other secondary failures.
  • Cold boot.
    • Backups need a superblock.
  • Threading

Tasks deferred until later:

  • User-space 10GigE driver (just use Infiniband)
  • Enable failure detector (failure detection comes from clients)
  • Make FastTransport fast (just use Infiniband)
  • Mechanism for splitting and moving tablets
  • Non-volatile log buffers (allow data loss during datacenter-wide power failures)
  • Network partitions
  • Administration and diagnosis tools (implement only things that we desperately need, as they are discovered)
    • Table enumeration?