Skip to end of banner Go to start of banner

RAMCloud 1.0

Skip to end of metadata

Created by Ryan Stutsman, last modified on May 14, 2012

Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Target Timeframe

July - October 2012

Issues

Tablet map needs restructuring

Progress

5/11/12
- Fault-tolerant coordinator: new design in progress
- Cold start attempted; fails on enlistment since CoordinatorServiceList isn't persisted
- Enumerate: designed, coding
- Fault tolerance: new python class for scripting more interesting failure scenarios for RAMCloud
- Log cleaner: gathering metrics

Goals

Support a high-volume website
- Requires durability & availability
Support experimental applications
- May not require durability, only minimal availability
Expect users to require serious hand-holding and interaction with RAMCloud team to develop, deploy, and support their application

Features

Fault-tolerant coordinator (Ankita)
- Log cabin (Diego)
Cold start (Ryan)
Client retry (John)
Enumerate (Elliott?)
Synchronous backup write mode

Stability and Testing

Fault-tolerance (Ryan)
- Master recovery
- Backup recovery
- Cold start
Log cleaner (Steve)
Overload (Steve)

Deployment

Documentation for development and deployment (as much as the group can collectively generate in 1 day)
Client interface cleanup (as much as the group can collectively do in 1 day)
Packaging (make install)
Archival/Extraction via enumerate (see above)

Notes

Planned supported transports
- TCP: Easy deployment on vanilla hardware, low performance
- InfRc: Requires Infiniband NICs/switches, high performance
Planned supported scale
- 80 nodes
- Test scale down so we can at least give a lower-bound on usable cluster size

Deferred

Tablet migration
Supporting additional transports/10 G Ethernet
Performance testing
Scale up testing
Monitoring/Management
Additional bindings

No labels