This page documents the goals and results for a series of milestones for the first implementation of RAMCloud. Each milestone last 2 weeks, ending on a Monday.
Milestone 17: ends October 4, 2010
- Ryan goals:
- Finish edits in response to code review.
- File bugs on remaining important issues in FastTransport.
- Ryan actual:
- Steve goals:
- Finish Infiniband Transport and Driver
- Simple benchmarks of the above
- Restart work on new Log and Segment code
- Steve actual:
- Diego goals:
- Finish updates to Transport API.
- Update Driver API.
- Finish Ethernet packet driver.
- Diego actual:
Milestone 16: ends September 21, 2010
- Ryan goals:
- Code review FastTransport, commit with review changes
- Analyze performance of FastTransport.
- Ryan actual:
- Working through code review:
- Done with everything except FastTransport*
- Worked through easy comments in FastTransport.h
- Working through code review:
- Steve goals:
- Infiniband transport & driver
- Continue working on performance discrepancy with Mellanox
- Steve actual:
- Driver mostly done (blocked on ServiceLocator), does copying right now to avoid memory registration issues
- Transport 80% coded
- Mellanox: our hardware is slower, but we're not sure why (they get latency up to 40% better than us)
- Diego goals:
- Finish ServiceLocator.
- Finish Ethernet packet driver.
- Resolve monitor issue.
- Diego actual:
- Class/code review done, but integration still in progress.
- No packet driver work.
- Monitor #5 works
- John goals:
- TCPTransport performance improvements.
- Class prep.
- John actual:
- No TCPTransport work yet.
Milestone 15: ends September 7, 2010
- Ryan goals:
- Tuning/code review for FastTransport
- Ryan actual:
- Code review imminent
- Steve goals:
- Gone one week
- Infiniband transport working
- Steve actual:
- Gone one week
- Partial Infiniband transport (waiting for ServiceLocators)
- Talking with Mellanox about performance
- Diego goals:
- Working monitor
- Driver to send raw Ethernet packets through the kernel
- Diego actual:
- Monitor still broken
- Ethernet packet driver exists but not in proper form
- Implemented ServiceLocator class; undergoing code review, not integrating cleanly
- John goals:
- Really really check in Client/Server changes.
- Start on smaller issues.
- John actual:
- Committed Client/Server changes
- Fixed several smaller issues (cpplint, etc.)
Milestone 14: ends August 23, 2010
- Ryan goals:
- Finish documentation and tests for FastTransport (1 week)
- Possible next steps:
- Redo sessions
- Add UDP layer
- Ryan actual:
- A few more unit tests to write for FastTransport
- More work needed on docs
- FastTransport currently ~1300 lines
- Steve goals:
- Learn about Infiniband software layers
- Start on Infiniband Transport class
- Steve actual:
- Still learning Infiniband, writing simple stand-alone app
- Setup code for Infiniband complex
- Mellanox person coming tomorrow for training
- Diego goals:
- On vacation
- Diego actual:
- Vacation
- John goals:
- Finish Client/Server changes and check in (1 week)
- Odds and ends
- John actual:
- Client/Server changes code-reviewed, working on followups
- Switched to using exceptions instead of Status returns.
Milestone 13: ends August 9, 2010
- Ryan:
- New transport working without timers
- Take basic performance measurements of new transport
- Timers working with new transport
- Ryan results:
- C++ FastTransport implementation compiles and is showing signs of life, but has almost no tests or documentation yet.
- It looks like the Session mechanism is going to need a redesign.
- Steve goals:
- Intel driver completely working in user space
- Basic user-space 10G performance measurements (without fast transport)
- Ideal: measure performance with user-space driver and new transport
- Steve results:
- Mellanox NICs and switch arrived
- Got Infiniband and 10GE pings working using "out-of-the-box" software.
- Starting to learn about Infiniband verbs
- Intel work is on hold for now
- Diego:
- Finish translating ClientSession into C++.
- On vacation
- John goals:
- Client converted to C++, upgraded for full use of Buffers, integrated with RAMCloud
- John results:
- Client converted to C++ and tested, along with Server.
- Several new header files, such as Rpc.h and Table.h
- All existing C++ code (including ClientMain and Bench) now uses the new Client.
- Still need to convert the Python bindings and Python code.
Milestone 12: ends July 26, 2010
- Ryan:
- Implement a few specific benchmarks using Metrics counters (actual: no change)
- Commit Metrics class (actual: done)
- Aggregator classes (actual: started)
- Additional: started translating Transport from Python to C++; coded InboundMessage; working on OutboundMessage.
- Steve:
- Rerun E1000 measurements on new machines (actual: done, but slightly slower than HiStar machines: 19 us for 64 bytes vs. 18.4 us)
- 10G driver running in user space (actual: kernel pings work; using "HugeTLB" to map big pages, pin; wrote "HugeTLBPhys" driver to return physical addresses; all that's left is to map device registers into user space)
- Diego goals:
- Finish Python version of fast transport
- Possibly start on logging mechanism
- Diego results:
- Python version done
- Committed basic logging (prints to stderr)
- Helping Ryan with Python->C++ translation of Transport
Milestone 11: ends July 12, 2010
- Ryan:
- 2 or 3 basic benchmarks (est: 3 days; actual: some simple benchmarks running, Metrics class ready for code review)
- Integrate into Makefile (est: 1 day; actual: done, server started automatically for tests now)
- Steve:
- Setup machines, send RAMCloud packets to/from each machine (actual: done)
- Experiment with 10G user-space driver (actual: works using standard in-kernel approach, not yet working from user space)
- Rerun E1000 tests on new machines (actual: not done)
- Log stuff (actual: not done)
- Diego:
- Finish buffer integration (actual: done, merged into master)
- More transport work (actual: still working on Python prototype; feature complete)
- Vacation July 28-Aug. 13
Milestone 10: ends June 28, 2010
- Ryan:
- Performance infrastructure (actual: 3 days, basic measurements working)
- Steve:
- Setup machines? (actual: machines just arrived)
- Revisions to log code (actual: log code complete rewrite (mostly coded, no tests/docs), just starting on cleaner)
- Diego:
- Finish buffer integration (partial progress, not done: .25 day)
- Decompose fast transport into tasks (actual: task changed, see below)
- Start on fast transport tasks (actual: task changed, see below)
- Revised task: full working Python prototype on UDP (actual: mostly there: 6 days)
- Additional task: Python mock script (reduce duplication of prototypes, other improvements: .3 day)
Milestone 9: ends June 14, 2010
- Ryan:
- Benchmark planning (actual: worked mostly on this; some initial end-to-end measurements working)
- Try out simulator for performance analysis (actual: tried it, but not sure how interesting it is yet)
- Steve:
- 6 machines ordered (est: 1 day; actual: in purchasing)
- Set machines up (est: 1 day; actual: N/A)
- Cleanup log code (TBDactual: just getting started)
- Understand existing drivers (w. Aravind) (actual: done)
- Aravind:
- Transfer knowledge
- Diego:
- Replicate Aravind 11 usec numbers (est: 1 day) (actual: done)
- Take over fast transport from Aravind (to be fleshed out ASAPactual: done)
- Integrate Buffer into code base ?Improvements to Python mock pre-processor?(actual: partial progress, 1 day)
Milestone 8: ends May 31, 2010
...