Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is an attempt to create an official, unambiguous set of terms to be used throughout RAMCloud discussions, texts, and code. These terms are listed here to minimize the confusion and overhead that goes along with constantly redefining vocabulary, but of course these terms and definitions should evolve over time.

The glossary is split up into sections to make it easier to find terms. The partitions don't play out entirely cleanly, though, and (circular) references are all over the place. Maybe alphabetically with no partitions would be better...ideas?

Machine Types

Client

A client machine has a library that queries servers on behalf of an application.

Server

A server machine services queries from clients.

Master Server

A master server is in charge of handling requests for a set of shards. A master server is usually also a backup server for another set of shards.

Backup Server

A backup server is in charge of backing up a set of shards, usually from various master servers. A backup server is usually also a master server for another set of shards.

Master

The master for an object is the master server for the shard which contains the object.

Data Types

Blob

A blob is some opaque, binary data stored in the system.

Object

An object is an identifiable container for a blob. It is identified by a table and a primary key. The object contains a checksum for the blob to verify its integrity. The object also keeps a running version number, referring to the revision of the blob stored.

Object ID

Steve: A (key, version) tuple.

In a previous meeting we had used Object ID as a (table, primary key) tuple. Which is it?

A (key, version) tuple scoped to a table would identify a blob. If that's useful, maybe it should be a Blob ID.

Table

A table is a logical grouping of objects which share a set of indexes. An object lives in exactly one table and an index is associated with exactly one table. A table also scopes primary keys within the system.

Primary key

A primary key is a system-assigned 64-bit integer (which will never be reused?) that uniquely identifies an object within a table.

Version Number

A version_number of an object is an integer that refers to the revision of its blob. It is used for the overwrite request, which asserts the previous version number as a parameter: if the server finds the given version number is out of date, the overwrite request will be aborted. When an object is modified, the version number will be incremented by the system.

Shard

A shard is a set of objects corresponding to a contiguous region of primary keys of a table – a subset of a "table's" key space. An object is a member of exactly one shard. Shards are sized for efficient disk access (i.e., they can be sucked into memory from a backup server's disk within a small amount of time if the master server fails).

Block

A block is the main unit of memory allocation on a server. Blocks have a fixed-size, and much of a server's memory will be treated as an array of blocks. Blobs are stored in a set of blocks. A block is mapped into the blob of an object by that object's inode. A blob may start partway through the first block and end partway through a last block. In either case, space may be shared with another object. A block then stores one or more blobs or portions of blobs.

Inode

An inode stores an object. This includes the blob's checksum, its version, and all of its index entries. It also includes the mapping of blocks for the its blob, the start offset, and the length.

Shardlet

...

backup

Anchor
backup
backup
Each server in the RAMCloud system serves two roles: it is master for some objects (it stores those objects in its DRAM and handles reads and writes for the object), and it also serves as backup for other objects. A backup server is responsible for storing information on disk in response to requests from masters, and retrieving that information from disk during crash recovery. Each object in RAMCloud is typically backed up on several machines, each master divides its data among many different backup machines, and each backup records information for many masters.

block

Anchor
block
block
The unit of memory management for object storage on masters. A master divides its stored data into shards (the unit of assignment to backups), each of which is divided into segments (the unit of disk I/O), each of which is divided into blocks in the master's memory. The master can clean blocks within a segment individually in order to avoid fragmentation of its memory.

client machine

Anchor
clientMachine
clientMachine
A machine running one or more applications that use the RAMCloud storage system. The applications normally use a client library package to communicate with the RAMCloud servers. Client machines cannot necessarily be trusted by the RAMCloud servers, and the RAMCloud system does not depend on any particular behavior of client machines.

client library

Anchor
clientLibrary
clientLibrary
A collection of functions used by applications to access the RAMCloud storage system. The client library may include significant functionality that extends the base functions provided by the RAMCloud servers. For example, the client library will probably understand the contents of stored objects, whereas the servers treat the objects as opaque blobs. It is possible for there to be different client libraries that implement different abstractions on top of the base RAMCloud features; examples might be a memcached API, a full relational model, or a file system API.

index

Anchor
index
index
Used to provide efficient forms of lookup on data within tables. Each table may have any number of indexes associated with it; each index maps from keys in some form (strings, numbers, timestamps, etc.) to a set of objects within the table. Indexes support range lookups as well as exact matches.

key

Anchor
key
key
A 64-bit identifier that names an object within its table. By
default keys are assigned sequentially by the RAMCloud system starting at 1 and are never reused; however, applications can choose keys explicitly if they wish, in which case they may also be reused.

master

Anchor
master
master
Each object lives in the DRAM of a particular server, which has primary responsibility for managing the object. That server is called the master for the object. All reads and writes of an object must be directed to the master server for that object. In normal usage, each server in the RAMCloud system is master for many different objects.

object

Anchor
object
object
The basic unit of data stored in the RAMCloud system. Each object is named with a key that uniquely identifies the object within its table. Objects are variable-length, and the RAMCloud servers do not interpret the contents of objects: they are just blobs of data.

segment

{anchor: segment) A portion of a shard; this is the unit in which backups write information to disk and is chosen large enough to ensure efficient disk I/O. The segment is also the unit of log cleaning: when a master has copied all live data from a segment, it instructs the backup to delete the segment. Segments are divided into blocks.

server

Anchor
server
server
One of the machines implementing the RAMCloud storage system. Server machines are "owned" by RAMCloud: they only execute trusted RAMCloud code. Server machines execute RPC requests coming from clients, and also communicate among each other to manage the RAMCloud system.

shard

Anchor
shard
shard
In order to speed up recovery, each master spreads its data across multiple backups; during recovery, the backups can all retrieve their respective portions of the data in parallel. The portion of a master's data that is assigned to a single backup is called a shard. A master's data will typically divide into hundreds or thousands of shards; furthermore, each shard is typically backed up on more than one machine, to provide safety against multiple crashes. Shards are divided into segments, which are in turn divided into blocks.

table

Anchor
table
table
Used to group related objects and to separate data from different applications. Objects are named using a table identifier and a key within the table. Access control information is based on tables, and indexes are associated with particular tables.

version number

Anchor
versionNumber
versionNumber
An integer value associated with each object, which starts at 0 when the object is created and is incremented every time the object is modified. Used to implement atomic operations on the object.

Index

An index is a lookup structure from arbitrary index keys to objects. An index is made up of index entries and can service range queries.

Index Key

An index key is a short, application-controlled type used in index lookup queries and range queries on indexes. The types may include string, int, float, etc.

Index Entries

An index entry is an (index key, object) pair in a given index.

Request Types

Get Request

The get request asks the server to return the object for a given table and primary key.

Overwrite Request

The overwrite request asks the server to replace the blob for a given table and primary key at a specific version number with a new blob.

Insert Request

The insert request asks the server to create a new object for a given table with the given blob and return the newly allocated primary key.

Put Request

The put request first runs the overwrite request. If no such object exists, it then runs the insert request. This happens atomically on the server.

Index Lookup Query

The index lookup query asks a server(s?) for objects matching a certain index key value in an index.

Range Query

The range query asks a server(s?) for objects matching certain index key ranges in an index.