Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.



  • 7/12/12

    • Coordinator: Implementation in progress, making basic state persistent
    • Log Cabin: Implementing consensus module; interface and durability already working; coordinator work is not blocked on it
    • Client retry: Should be done in about a week; converting existing rpcs to the new architecture
    • Enumeration: Functional, needs real-world testing
    • Log Cleaner: Redesign done; integrating with log refactoring, should done in a week
    • Fault tolerance: Recovery can survive all sorts of failures (recovery master crashes, loss of backups); recovery of multiple hosts works; still smoking out bugs
    • Cold start: Awaiting client retry, but hack allows some basic testing; have found a fixed a few bugs, but haven't been able to successfully cold start yet
    • New potential requirement: leases?
  • 5/11/12
    • Fault-tolerant coordinator: new design in progress
    • Cold start attempted; fails on enlistment since CoordinatorServiceList isn't persisted
    • Enumerate: designed, coding
    • Fault tolerance: new python class for scripting more interesting failure scenarios for RAMCloud
    • Log cleaner: gathering metrics


  • Fault-tolerant coordinator (Ankita)
    • Log cabin (Diego)
  • Cold start (Ryan)
  • Client retry (John)
  • Enumerate (Elliott?)
  • Synchronous backup write mode
  • Leases?

Stability and Testing

  • Fault-tolerance (Ryan)

    • Master recovery
    • Backup recovery
    • Cold start
  • Log cleaner (Steve)
  • Overload (Steve)