Design Seminars - Spring Quarter 2009

This page was used during the Spring Quarter of 2009 to collect design ideas for RAMCloud, a new kind of long-term storage based entirely in the main memories of a collection of commodity server machines.

Working Assumptions for RAMCloud:

Designed primarily for usage within a datacenter: there will be a collection of hundreds or thousands of dedicated RAMCloud server machines, which provide storage for hundreds or thousands of application machines running Web front-ends or other applications.
Supports 10000 or more RAMCloud servers.
Servers are commodity rack-mounted machines with as much memory as is cost effective (32-64 GB today?)
All information is stored at all times in main memory: disk accesses are not in the critical path for either reading or writing data.
Low latency access from other machines in the same datacenter: 5-10 microseconds total round-trip for small requests.
High throughput: 1 million requests/second for small requests, using servers with modest numbers of cores (4-8?).
Data durability equal to the best disk-based systems.

Goals for The Quarter

Explore the design space around RAMClouds
Err on the side of inclusivity of ideas and approaches rather than exclusivity: we don't have to make any actual design decisions this quarter, but we'd like to understand the consequences of a range of alternatives
Collect our ideas on this Wiki

Schedule

April 1: Data persistence (Ousterhout)
April 7: Node architecture (Leverich/Kozyrakis)
April 8: Low latency RPCs (Narayanan)
April 14: no meeting
April 15: no meeting
April 21: Distribution of data among servers, replication, locality (Stutsman)
April 22: Scalability (Rumble)
April 28: Applications(Erickson) and
April 29: PNUTS: weak consistency (Agrawal)
May 5: Data model (Rosenblum)
May 6: The role of flash memory and other technologies (Kozyrakis)
May 12: Security and access control (Mazieres)
May 13: no meeting
May 19: Split of functionality between servers and clients (Stratmann)
May 20: Naming and Indexing (Ousterhout)
May 26: Reliability (Mitra)
May 27: Network substrate (Rumble)
June 2: TBD
June 3: Spring Discussion Wrapup (Kozyrakis)

Future Discussion Topics

The name(s) after each topic are people who have volunteered to lead a discussion on that topic:

Concurrency model: locking, consistency, transactions
Lights-out automated management
Multi-tenancy: supporting independent applications in a datacenter (Aravind Narayanan)
Energy management (Jacob Leverich)
Memory management within a server (David Mazieres)
Online schema changes(Eric Stratmann)
Undo, redo, audit trail(Ryan Stutsman)
Media and large files (Mendel Rosenblum)
Using client DRAM
Monitoring and logging tools: what information needs to be tracked in order to support dynamic configuration?
Dynamic reconfiguration: when and how does the system reconfigure itself?
Suppose we wanted to support a stronger consistency model than single-record: what would be the implications for the design of the system? Is this feasible?
Object naming

Miscellaneous Topics

Recent updates

Cluster Issues
Dec 09, 2019 • contributed by Stephen Yang
RAMCloud
Nov 02, 2019 • contributed by John Ousterhout
General Information for Developers
Oct 17, 2019 • contributed by Stephen Yang
Cluster Custodian
Oct 17, 2019 • contributed by Behnam Montazeri Najafabadi
Yilong's Random Notes on MilliSort Implementation
Jun 30, 2019 • contributed by Yilong Li
InfUdBwNotes.adoc
Jun 30, 2019 • attached by Yilong Li
Cluster Intro
Apr 07, 2019 • contributed by Yilong Li
tail-latency-timetrace.txt
Mar 25, 2019 • attached by Yilong Li
experiment-results.txt
Mar 25, 2019 • attached by Yilong Li
slice-objectpool-construct.txt
Mar 25, 2019 • attached by Yilong Li
SMI_interrupts.txt
Mar 25, 2019 • attached by Yilong Li
hwlat.txt
Mar 25, 2019 • attached by Yilong Li
iblinkinfo.png
Mar 25, 2019 • attached by Yilong Li
cyclesPerSec.txt
Mar 25, 2019 • attached by Yilong Li
high_latency_of_handling_single_packet_RPC.txt
Mar 25, 2019 • attached by Yilong Li