Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This page tracks the work remaining to reach version 1.0 of RAMCloud. This version is intended to be a "Least Usable System:" the smallest amount of functionality that could be sufficient to support actual applications.

Target Timeframe

July - October 2012



  • 7/12/12

    • Coordinator: Implementation in progress, making basic state persistent
    • Log Cabin: Implementing consensus module; interface and durability already working; coordinator work is not blocked on it
    • Client retry: Should be done in about a week; converting existing rpcs to the new architecture
    • Enumeration: Functional, needs real-world testing
    • Log Cleaner: Redesign done; integrating with log refactoring, should done in a week
    • Fault tolerance: Recovery can survive all sorts of failures (recovery master crashes, loss of backups); recovery of multiple hosts works; still smoking out bugs
    • Cold start: Awaiting client retry, but hack allows some basic testing; have found a fixed a few bugs, but haven't been able to successfully cold start yet
    • New potential requirement: leases?
  • 5/11/12
    • Fault-tolerant coordinator: new design in progress
    • Cold start attempted; fails on enlistment since CoordinatorServiceList isn't persisted
    • Enumerate: designed, coding
    • Fault tolerance: new python class for scripting more interesting failure scenarios for RAMCloud
    • Log cleaner: gathering metrics


  • Support a high-volume website
    • Requires durability & availability
  • Support experimental applications
    • May not require durability, only minimal availability
  • Expect users to require serious hand-holding and interaction with RAMCloud team to develop, deploy, and support their application


  • Fault-tolerant coordinator (Ankita)
    • Log cabin (Diego)
  • Cold start (Ryan)
  • Client retry (John)
  • Enumerate (Elliott?)
  • Synchronous backup write mode
  • Leases?

Stability and Testing

  • Fault-tolerance (Ryan)

    • Master recovery
    • Backup recovery
    • Cold start
  • Log cleaner (Steve)
  • Overload (Steve)


  • Documentation for development and deployment (as much as the group can collectively generate in 1 day)
  • Client interface cleanup (as much as the group can collectively do in 1 day)
  • Packaging (make install)
  • Archival/Extraction via enumerate (see above)


  • Planned supported transports
    • TCP: Easy deployment on vanilla hardware, low performance
    • InfRc: Requires Infiniband NICs/switches, high performance
  • Planned supported scale
    • 80 nodes
    • Test scale down so we can at least give a lower-bound on usable cluster size


  • Tablet migration
  • Supporting additional transports/10 G Ethernet
  • Performance testing
  • Scale up testing
  • Monitoring/Management
  • Additional bindings
  • No labels