Rotation and CURIS Ideas
New Small Project Ideas
- MultiFileStorage
- Backups should be able to use multiple disks at the same time. This shouldn't be too hard to do, and it is probably even reasonably fun to do. A MultiFileStorage could instantiate multiple SingleFileStorages and load balance across them. Shouldn't be too hard since the interface for the storage class isn't too wide.
CURIS 2012 Proposals
Overview
CURIS (curis.stanford.edu) is a CS undergraduate summer research program. Below are some RAMCloud proposals we may submit for 2012.
Proposals are apparently around 2 paragraphs. We are permitted to submit multiple separate proposals, each of which can be taken up by one or more students. Different proposals will feature a generic RAMCloud paragraph followed by a more specific description of the project(s) in each proposal.
Generic Introduction
The RAMCloud project is creating a new class of high-speed storage for datacenters, where all data is kept in DRAM at all times. RAMCloud is a software system that aggregates the DRAM of thousands of servers into a single large-scale and extremely fast storage system (small objects can be read from any server in the same datacenter in 5-10 microseconds, which is 100-1000x faster than today's disk-based storage systems). The end goal is to make exciting new applications possible by pushing the boundaries of scale and latency in datacenter storage systems. This is a large, open-source project headed by Professors John Ousterhout and Mendel Rosenblum, and there are four full-time graduate students currently working on various aspects of the system. We are committed to making RAMCloud a robust, production-ready system, rather than just a research prototype. We currently build and test RAMCloud on an 80-node, 320-core Linux cluster with an aggregate of nearly 2 TB of main memory and 20 TB of flash storage, all connected by a high-performance Infiniband network.
Proposal 1: Web Dashboard for Cluster Monitoring & Management
Since RAMCloud is a large and complicated distributed system, being able to monitor and visualize what is going on in real time is critically important. We are looking for students interested in Web development and visualization to design and build a Web-based dashboard for RAMCloud to help give us better insight into how the system is behaving as a whole. The dashboard will monitor machines in the cluster, aggregate and report server statistics, and allow us to visualize the system in real time. The same dashboard will also be used to manage the system; to add and remove machines from the cluster, and migrate data between individual servers. Prior Web experience (especially with AJAX and Javascript) and/or CS 142 will be useful for students on this project. CS 140 and CS 144 are also provide useful background.
Proposal 2: RAMCloud Core Systems Development
RAMCloud currently consists of about 75,000 lines of C++, docs, and unit tests, but it is far from complete. In this project students will work on one or more aspects of the RAMCloud implementation, such as the following: making more pieces of the system multithreaded to increase server throughput; expanding the simple read/write operations with additional operators (such as atomic increment); improving the crash-recovery system, which recovers so quickly after server crashes that most users don't even notice the crash; or implementing additional operations to support the development of a Web-based dashboard for RAMCloud. For this project prior experience with C++ is highly desirable; however, students skilled in C and familiar with other object-oriented languages like Java should be able to learn C++ on the job. CS 140 and CS 144 provide good background for students interested in this project, though they are not essential.
Generic Footer
For more information about RAMCloud, you can refer to:
- RAMCloud wiki
- White paper that describes the motivation for RAMCloud and introduces some of the research issues we are addressing
- The paper on Fast Crash Recovery in RAMCloud that appeared in SOSP 2011.