Deciding Whether to Use RAMCloud

This page contains a collection of notes that may help you to decide whether RAMCloud is a good match for your application needs.

Reasons why you might want to use RAMCloud

RAMCloud's main "claim to fame" is that it provides a key-value store with extremely fast access. If you have high-speed networking in your datacenter, you can access any small object anywhere in a RAMCloud cluster in about 5 μs.
In addition, RAMCloud can scale across hundreds or thousands of servers. For example, with 64 GB of DRAM on each of 1000 nodes, RAMCloud can store 64 TB of data; with 256 GB of DRAM on each of 4000 nodes, RAMCloud can store 1 Petabyte.
RAMCloud keeps backup copies of data on disk or flash, and it does this in a way that has almost no effect on normal performance. Thus, you can perform small write operations in RAMCloud in about 15 μs with a high level of confidence that the data will survive server crashes and power failures. There are several other storage systems that keep data in DRAM, but none of them provides as high a level of durability and availability as RAMCloud.
RAMCloud is particularly well suited to applications that need to use hundreds or thousands of independent pieces of data in a given request, and need to produce responses in real time. If your application started off using a traditional relational database such as MySQL, but you have discovered that the database can't scale to meet your needs (e.g., it can't return data fast enough to respond in real time, or it can't scale in throughput as your user community has grown), then RAMCloud is likely to be a good match. If you are considering sharding your data to use multiple independent database instances, RAMCloud is probably a better solution.
RAMCloud is particularly advantageous for applications that must make a series of requests, where each request depends on the preceding request, so that the requests cannot be issued in parallel. Graph algorithms are one example of this access pattern.

RAMCloud vs. memcached

In recent years many organizations have started using memcached to offload their databases and improve overall scalability. In this approach, memcached is used to cache in DRAM the results of recent database queries, which speeds up accesses and improves throughput. Although this approach can improve performance significantly, it still has four disadvantages in comparison to RAMCloud:

In order to ensure durability and availability, any modifications to data must go to the database, which is slow. RAMCloud performs writes 100-1000x faster than a database.
Application developers must manually manage consistency between the data in memcached and that in the database: if data is changed in the database, the application must find and flush any cache entries dependent on that data. This is a complex and error-prone process, and often results in applications using "stale" data that wasn't properly flushed from memcached. With RAMCloud, developers need not worry about these consistency issues.
The database is so much slower than memcached that it limits performance even with high hit rates in the cache. In contrast, RAMCloud keeps all data in DRAM all the time; there is no such thing as a cache miss in RAMCloud. With the memcached approach, if there are even 1% misses in the memcached cache, performance will drop by a factor of 10x compared to RAMCloud. Unfortunately, most large-scale applications using memcached have miss rates higher than 1% (for example, in a 2013 NSDI paper, Facebook reported miss rates of 0.053%, 1.76%, 6.35%, and 7.85% on four representative memcached servers).
RAMCloud has been optimized to provide extremely low latency, and it can take advantage of the fastest networking to provide access times as low as 5 μs. In contrast, Facebook reported typical memcached latencies of several hundred microseconds.

Reasons why you might not want to use RAMCloud

RAMCloud is not necessarily the best storage system for every possible application. Here are some situations where it may not be the right choice:

If your application doesn't make very intensive use of data, then it's probably not worth the extra cost of keeping the data in DRAM; you can use a disk-based system just as well.
If your primary applications are batch-oriented ones that read all of the data in the entire data set, then low latency probably doesn't matter. In these situations, application performance will be limited by the throughput of the storage system. Other storage systems more optimized for throughput, such as Spark, may provide higher performance than RAMCloud. For example, most analytics applications fit this model; RAMCloud is more appropriate for online applications where response time is important.
In order to achieve the fastest possible performance, you will need high-speed networking with two features:
- Fast switching time (less than 1 μs delay per switch). Most new 10 Gbs gear meets this requirement.
- NICs supporting kernel bypass (which means that the NIC can be mapped into the address space of applications, so they can send and receive packets without invoking kernel calls). Many newer NICs provide this feature.