...
Progress
- 11/8/12
- Risks: may need to add these.
- Gathering statistics for balancing recovery.
- LogCabin cleaning.
- Log Cabin: Reconsidering API.
- In progress
- Coordinator: Tablet map persistence is still to do.
- Cold Start: Try larger tests?
- Tablet map: Stop using protocol buffers on masters.
- Done
- Coordinator: Server list management. Recovery works.
- Log Cabin: Implementing consensus module.
- Log Cleaner: Done.
- Cold Start: Works.
- Leases: Done.
- Risks: may need to add these.
- 8/30/12
- Coordinator: Starting to test server list management; tablet map persistence is still to do.
- Log Cabin: Reimplementing consensus module.
- Client retry: Done.
- Log Cleaner: A week out from merge.
- Fault Tolerance: No progress; probably regressions due to churn in backup rewrite.
- Cold Start: Needs to be reimplemented due to backup storage changes.
- Leases: Officially a 1.0 feature, no other progress, though.
- Synchronous write mode: Done and merged.
- Tablet map: Awaiting code review fixes.
7/12/12
- Coordinator: Implementation in progress, making basic state persistent
- Log Cabin: Implementing consensus module; interface and durability already working; coordinator work is not blocked on it
- Client retry: Should be done in about a week; converting existing rpcs to the new architecture
- Table Enumeration: Functional, needs real-world testing
- Log Cleaner: Redesign done; integrating with log refactoring, should done in a week
- Fault tolerance: Recovery can survive all sorts of failures (recovery master crashes, loss of backups); recovery of multiple hosts works; still smoking out bugs
- Cold start: Awaiting client retry, but hack allows some basic testing; have found a fixed a few bugs, but haven't been able to successfully cold start yet
- New potential requirement: leases?
- 5/11/12
- Fault-tolerant coordinator: new design in progress
- Cold start attempted; fails on enlistment since CoordinatorServiceList isn't persisted
- Enumerate: designed, coding
- Fault tolerance: new python class for scripting more interesting failure scenarios for RAMCloud
- Log cleaner: gathering metrics
...