...
- Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
- Build a userspace 10GigE Driver for a strategically chosen NIC.
- Update the log cleaner.
- Enable the failure detector.
- Handle coordinator failures.
- Handle backup failures.
- Handle multiple master failures and other secondary failures.
- Cold boot.
- Backups need a superblock.
- Make FastTransport fast.
- Mechanism for splitting and moving tablets.
- Threading?
- Batteries for backup buffers?
- Use SSDs instead?
- Accept some data loss?
- Network partitions?
- Administration and diagnosis tools
- Table enumeration?
Option 3: Stripped-Down Key-Value Store
(This target would meet a lesser definition of "usable", probably only usable here at Stanford)
- Make RPCs more robust.
- Hosts should be able to recover from the other end of a session failing.
- RPCs need to time out eventually (this should be reasonably aggressive).
- Update the log cleaner.
- Handle coordinator failures.
- Handle backup failures.
- Handle multiple master failures and other secondary failures.
- Cold boot.
- Backups need a superblock.
- Threading
Tasks deferred until later:
- User-space 10GigE driver (just use Infiniband)
- Enable failure detector (failure detection comes from clients)
- Make FastTransport fast (just use Infiniband)
- Mechanism for splitting and moving tablets
- Non-volatile log buffers (allow data loss during datacenter-wide power failures)
- Network partitions
- Administration and diagnosis tools (implement only things that we desperately need, as they are discovered)
- Table enumeration?