Page Comparison

What is RAMCloud?

The RAMCloud project is creating a new class of super-high-speed storage , based entirely in DRAM, that is 2-3 orders of magnitude faster than existing storage systems. If successful, it will enable new applications that manipulate large-scale datasets much more intensively than has ever been possible before. In addition, we think for large-scale datacenter applications. It is designed for applications in which a large number of servers in a datacenter need low-latency access to a large durable datastore. RAMCloud offers the following properties:

Low latency: RAMCloud keeps all data in DRAM at all times, so applications can read RAMCloud objects remotely over a datacenter network in as little as 5μs. Writes take less than 15μs. Unlike systems such as memcached, applications never have to deal with cache misses or wait for disk/flash accesses. As a result, RAMCloud storage is 10-1000x faster than other available alternatives.
Large scale: RAMCloud aggregates the DRAM of thousands of servers to support total capacities of 1PB or more.
Durability: RAMCloud replicates all data on nonvolatile secondary storage such as disk or flash, so no data is lost if servers crash or the power fails. One of RAMCloud's unique features is that it recovers very quickly from server crashes (only 1-2 seconds) so the availability gaps after crashes are almost unnoticeable. As a result, RAMCloud combines the durability of replicated disk with the speed of DRAM. If you have used memcached, you have probably experienced the challenges of managing a second durable storage system and maintaining consistency between it and memcached. With RAMCloud, there is no need for a second storage system.
Powerful data model: RAMCloud's basic data model is a key-value store, but we have extended it with several additional features, such as:
- Multiple tables, each with its own key space.
- Transactional updates that span multiple objects in different tables.
- Secondary indices.
- Strong consistency: unlike other NoSQL storage systems, all updates in RAMCloud are consistent, immediately visible, and durable.
Easy deployment: RAMCloud is a software package that runs on commodity Intel servers with the Linux operating system. RAMCloud is available freely in open source form.

From a practical standpoint, RAMCloud enables a new class of applications that manipulate large data sets very intensively. Using RAMCloud, an application can combine tens of thousands of items of data in real time to provide instantaneous responses to user requests. Unlike traditional databases, RAMCloud scales to support very large applications, while still providing a high level of consistency. We believe that RAMCloud, or something like it, will become the primary storage system for structured data in cloud computing environments such as Amazon's AWS and or Microsoft's Azure.

The role of DRAM in storage systems has been increasing rapidly in recent years, driven by the needs of large-scale Web applications. These applications manipulate very large datasets with an intensity that cannot be satisfied by disks alone. As a result, applications are keeping more and more of their data in DRAM. For example, large-scale caching systems such as memcached are being widely used (in 2009 Facebook used a total of 150 TB of DRAM in memcached and other caches for a database containing 200 TB of disk storage), and the major Web search engines now keep their search indexes entirely in DRAM.

Although DRAM's role is increasing, it still tends to be used in limited or specialized ways. In most cases DRAM is just a cache for some other storage system such as a database; in other cases (such as search indexes) DRAM is managed in an application-specific fashion. It is difficult for developers to use DRAM effectively in their applications; for example, the application must manage consistency between caches and the backing storage. In addition, cache misses and backing store overheads make it difficult to capture DRAM's full performance potential.

Our goal for RAMCloud is to create a general-purpose storage system that makes it easy for developers to harness the full performance potential of large-scale DRAM storage. It keeps all data in DRAM all the time, so there are no cache misses. RAMCloud storage is durable and available, so developers need not manage a separate backing store. RAMCloud is designed to scale to thousands of servers and hundreds of terabytes of data while providing uniform low-latency access to all machines within a large datacenter.

As of Fall 2011, we had initial implementations of many of the components of RAMCloud and the system runs well enough to use it for simple tests. On our 60-node test cluster we are able to perform remote reads of 100-byte objects in about 5 microseconds, and an individual server can process more than 800,000 small read requests per second. The basic crash recovery mechanism is running, and RAMCloud can recover 35 GB of memory from a failed server in about 1.6 seconds.

The RAMCloud project is still young, so there are many interesting research issues still to explore, such as the following:

...

We have built the system not as a research prototype, but as a production-quality software system, suitable for use by real applications.

RAMCloud is also interesting from a research standpoint. Its two most important attributes are latency and scale. The first goal is to provide the lowest possible end-to-end latency for applications accessing the system from within the same datacenter. We currently achieve latencies of around 5μs for reads and 15μs for writes, but hope to improve these in the future. In addition, the system must scale, since no single machine can store enough DRAM to meet the needs of large-scale applications. We have designed RAMCloud to support at least 10,000 storage servers; the system must automatically manage all the information across the servers, so that clients do not need to deal with any distributed systems issues. The combination of latency and scale has created a large number of interesting research issues, such as how to ensure data durability without sacrificing the latency of reads and writes, how to take advantage of the scale of the system to recover very quickly after crashes, how to manage storage in DRAM, and how to provide higher-level features such as secondary indexes and

...

multiple-object transactions

...

without sacrificing the latency or scalability

...

of the system.

...

Our solutions to these problems are described in a series of technical papers.

The RAMCloud project was based in the Department of Computer Science at Stanford University. The project is no longer active and the students working on RAMCloud have graduated, so we cannot provide support for anyone wishing to use RAMCloud.

Learning About RAMCloud

General information about RAMCloud, such as talks and papers. Much of the information here is related to the research aspects of the project, as opposed to information on how to use RAMCloud.

Introductory talk on RAMCloud by John Ousterhout, given at LinkedIn on October 12, 2011.
The RAMCloud Storage System: a comprehensive paper describing RAMCloud, including the log-structured storage mechanism, RAMCloud's thread architecture and approach to low latency, and its crash recovery mechanisms. Published in ACM TOCS in September 2015.
The Case for RAMCloud: a an early position paper that discusses the motivation for RAMCloud, the new kinds of applications it may enable, and some of the research issues that will have to be addressed to create a working system. Appeared in CACM in July 2011.
An earlier and a slightly longer version appeared of the position paper, which appeared in Operating Systems Review in December 2009.
It's Time for Low Latency: HotOS 2011 workshop paper arguing for the OS community to focus on network latency.
Fast Recovery in RAMCloud: describes RAMCloud's mechanism for recovering crashed servers in 1-2 seconds. Appeared in SOSP in October, 2011
Log-Structured Memory for DRAM-based Storage: describes how RAMCloud manages the storage of objects both in DRAM and on disk. Glossary of RAMCloud TermsAppeared in FAST in February, 2014; won Best Paper Award.
Toward Common Patterns for Distributed, Concurrent, Fault-Tolerant Code: HotOS 2013 workshop paper describing a rules-based approach for building "DCFT" systems.
Articles about RAMCloud (Web and print media, written by people outside the RAMCloud group)
RAMCloud Papers (complete listing of all papers written by the RAMCloud group)
RAMCloud Presentations (Slides from talks about RAMCloud)
Glossary of RAMCloud Terms

How to Deploy and Use RAMCloud

RAMCloud has now reached a level of maturity where it is suitable for production use with real applications. The links below provide information on how to set up a RAMCloud cluster and on the RAMCloud APIs for applications.

Deciding Whether to Use RAMCloud
Supported Platforms
Setting Up a RAMCloud Cluster
Creating a RAMCloud Client
Application APIs (what features are available to applications)
Python Bindings
Service Locators
Technical Support

RAMCloud Performance

Measurements of RAMCloud performance, as well as comparisons between RAMCloud and other systems.

...

clusterperf benchmarks (benchmarks run on a cluster to measure basic things such as read and write latency and throughput)
How To Run Clusterperf
Perf benchmarks (microbenchmarks measuring various low-level operations on a single machine, such as atomic increment)
Performance Improvement Log
Recovery Performance Benchmark
RAMCloud Papers
Project History
Current team members

Resources

...

Latency Patterns in Infiniband (talk by Alex Mordkovich, May 2012)
RPC Latency Profile (the lifetime of a write operation, measured January 2012)
SSD Experiments (July 2011)
Redis vs. RAMCloud
Older Performance Measurements

Information for RAMCloud Developers

Information for people who are working on the RAMCloud code base; it is intended primarily for the internal use of the RAMCloud team at Stanford, but may be useful to other people as well.

General Information for Developers (how to get started as a RAMCloud developer)
Build System Structure
RAMCloud Tech Talks (Videos of RAMCloud developers describing the internals of various system components)
Want to Contribute to RAMCloud? (notes for people who would like to contribute code to RAMCloud)
Running Recoveries with recovery.py
Coding Conventions
Style Guide
Documentation Guidelines
Writing Unit Tests
Amendments to Current Documentation and Testing Guidelines
Software Design Philosophy – John Ousterhout's pet peeves
How To Measure Performance: John's pet peeves (and ideas for a possible paper)
RAMCloud C Style for EMACS
Vim Settings
Copyright Notice
Mfence – x86 instructions for limiting instruction reordering
Inside Concurrency Primitives
Wireshark Plugin DallyFastNetwork.pdf
NetBeans IDE tips
Measuring RAMCloud Performance
Code review tool
Phabricator code review tool
Git repo: see General Information for Developers
IRC channel: #ramcloud on freenode.
- - See rcres for coordinating usage of RAMCloud cluster. This is used to coordinate usage of the RAMCloud cluster. Anytime you are using the cluster you should be listening on this channel; if you don't respond to comments on the channel, your jobs may be killed.
  - Transcripts of this channel may be found here
RAMCloud Cluster Resource manager (rcres) : rcres is a shell command available on the "rcmaster" machine of the RAMCloud cluster. Any time you are using the cluster you should ensure that you lease the machines you are using using rcres.
Dumpstr tool for viewing reports (mostly performance data)
Documentation, generated nightly from the source code

...

The RAMCloud Test Cluster

...

Information

...

RAMCloud Cluster

...

about the cluster we use for RAMCloud testing at Stanford. Unfortunately not all of this information is completely up to date.

Cluster Intro – information about our cluster for newcomers
New Contributor Checklist (how to set up access for new team members)
/wiki/spaces/RAM/pages/6848593 – for sysadmins
Cluster Custodian - rotatiing responsibility for managing the cluster and providing technical support
Cluster Issues - central location for keeping track of problems in the cluster
Cluster Inventory - includes notes about cluster setup and spare components
Intel 530 Performance recent performance issues with Intel 530 SSDs
SSD Latency Experiments - Performance measurements of our cluster's SSDs (2016)
Cluster Tasks - (not so) recent issues with cluster machines
Machine Evaluations
Compiling RAMCloud on CentOS
Tips from Charlie & Co
Cluster Configuration – for sysadminsReimaging a Cluster Machine
Installing New Software on the Cluster
Controlling Machines Remotely via IPMI
Updating BIOS automatically with PXE and FreeDOS
Infiniband Tools and Debugging
Updating Mellanox NIC Firmware (to eliminate limit on timeouts)Cluster Inventory
Dead Machines
New Infiniband Fabric Notes
Mellanox HW and Infiniband Notes

Informational

Performance Measurements
Redis vs. RAMCloud
Assumptions
Back-of-the Envolope Calculations: rough estimates of various interesting properties of the system
RPC Protocol
The Fastest Possible Datacenter Network (Bill Dally talk)
Garbage Collection Resources

Current work

Future Projects and PhD Topics
How To Measure Performance: John's pet peeves (and ideas for a possible paper)
RAMCloud 1.0
Tablet Migration
LogCabin
Rethinking Tombstones
Open Questions
Data Operations
Detecting Incomplete Logs
Usability Features and Research Topics

Old Topics

Distributed Leases - A proposal for ensuring that a "dead" server does not continue serving requests after it has been replaced.
The ALPO consensus protocol /wiki/spaces/RAM/pages/6848654 (for BIOS and boot-time configuration)

New Cluster

ATOM Cluster : Micro Modular Server Cluster – 132 ATOM servers

Design Notes

These documents were used at various points in the project to record our early ideas about various parts of the system. Most of these pages are now out of date (they typically are not updated once serious coding begins) but they may still provide useful background information as well as alternatives that we considered. Entries below are in reverse chronological order (most recent design notes first).

Project History, Schedules, Milestones

Project History
Linearizable RPC & TX progress
RAMCloud 1.0 (used in 2012-2013 to track progress towards first usable release)
Least Usable System - Candidates for the "next major goal" (early April 2011).
Recovery Blitz (Autumn Quarter 2010)
Milestones from 2010
Design Meetings from Winter Quarter 2010
Design Meetings from Spring Quarter 2009
Backup and Recovery Revisited
Coordinator
FastTransport
Index API
Primary Keys
Proposed Server API
Protocol Buffers
RAMCloud Filesystem
Recovery
RPC API
Server Memory Architecture
Transactions
Version Numbers
Workload Generator- A benchmark for testing and understanding characteristics of RAMCloud under load
- Inf Under Load - Understanding infiniband under load

Miscellaneous Topics

Ideas for Future Work

Miscellaneous Topics

Distributed Systems Reading Group
Team Members
Group Photos
Lunch Ideas
Current Applications (applications that are using RAMCloud or considering it)
SEDCL/PlatformLab Retreat - Industrial Feedback
Server Prices: sample server configurations and prices
Memory Prices
Facebook Information
References
Interesting LinksInteresting Statistics
Infolunch Notes
Lunch Ideas Old Miscellaneous Topics
New Cluster Wishlist

Personal Wikis

Steve/wiki/spaces/RAM/pages/6848529
Ankita's Coordinator Notes
Ankita's Datamodel Notes
DCFT Paper Notes
Behnam's Coordinator NotesAnkitaNotes
Henry's Notes on Arachne-RAMCloud Integration
Satoshi's Datamodel Notes

Versions Compared

Old Version 244

New Version Current

Key

What is RAMCloud?

Learning About RAMCloud

How to Deploy and Use RAMCloud

RAMCloud Performance

Resources

Information for RAMCloud Developers

The RAMCloud Test Cluster

RAMCloud Cluster

Informational

Current work

Old Topics

New Cluster

Design Notes

Project History, Schedules, Milestones

Miscellaneous Topics

Ideas for Future Work

Related Topics

Miscellaneous Topics

Personal Wikis