Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Target Timeframe

July - October 2012

Issues

Progress

  • 5/11/12

    • Fault-tolerant coordinator: new design in progress
    • Cold start attempted; fails on enlistment since CoordinatorServiceList isn't persisted
    • Enumerate: designed, coding
    • Fault tolerance: new python class for scripting more interesting failure scenarios for RAMCloud
    • Log cleaner: gathering metrics

Goals

  • Support a high-volume website
    • Requires durability & availability
  • Support experimental applications
    • May not require durability, only minimal availability
  • Expect users to require serious hand-holding and interaction with RAMCloud team to develop, deploy, and support their application

Features

  • Fault-tolerant coordinator (Ankita)
    • Log cabin (Diego)
  • Cold start (Ryan)
  • Client retry (John)
  • Enumerate (Elliott?)
  • Synchronous backup write mode

Stability and Testing

  • Fault-tolerance (Ryan)

    • Master recovery
    • Backup recovery
    • Cold start
  • Log cleaner (Steve)
  • Overload (Steve)

Deployment

  • Documentation for development and deployment (as much as the group can collectively generate in 1 day)
  • Client interface cleanup (as much as the group can collectively do in 1 day)
  • Packaging (make install)
  • Archival/Extraction via enumerate (see above)

Notes

  • Planned supported transports
    • TCP: Easy deployment on vanilla hardware, low performance
    • InfRc: Requires Infiniband NICs/switches, high performance
  • Planned supported scale
    • 80 nodes
    • Test scale down so we can at least give a lower-bound on usable cluster size

Deferred

  • Tablet migration
  • Supporting additional transports/10 G Ethernet
  • Performance testing
  • Scale up testing
  • Monitoring/Management
  • Additional bindings
  • No labels