Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

This page is an attempt to create an official, unambiguous set of terms to be used throughout RAMCloud discussions, texts, and code. These terms are listed here to minimize the confusion and overhead that goes along with constantly redefining vocabulary, but of course these terms and definitions should evolve over time.

The glossary is split up into sections to make it easier to find terms. The partitions don't play out entirely cleanly, though, and (circular) references are all over the place. Maybe alphabetically with no partitions would be better...ideas?

Machine Types

Client

A client machine has a library that queries servers on behalf of an application.

Server

A server machine services queries from clients.

Master Server

A master server is in charge of handling requests for a set of shards. A master server is usually also a backup server for another set of shards.

Backup Server

A backup server is in charge of backing up a set of shards, usually from various master servers. A backup server is usually also a master server for another set of shards.

Master

The master for an object is the master server for the shard which contains the object.

Data Types

Blob

A blob is some opaque, binary data stored in the system.

Object

An object is an identifiable container for a blob. It is identified by a table and a primary key. The object contains a checksum for the blob to verify its integrity. The object also keeps a running version number, referring to the revision of the blob stored.

Object ID

Steve: A (key, version) tuple.

Table

A table is a logical grouping of objects which share a set of indexes.

Steve: A scoping of keys within the system. All objects live in one table and all indices are associated with one table.

Primary key

A primary key is a system-assigned 64-bit integer (which will never be reused?). Together with a table, it identifies an object.

Steve: The system-generated key that uniquely identifies an object within a table.

Version Number

A version_number of an object is an integer that refers to the revision of its blob. It is used for the overwrite request, which asserts the previous version number as a parameter: if the server finds the given version number is out of date, the overwrite request will be aborted.

Steve: A number associated with a primary key that is system-incremented when an object is modified.

Shard

A shard is a set of objects corresponding to a contiguous region of primary keys of a table. An object is a member of exactly one shard.

Steve: A subset of a table's key space. Shards are sized for efficient disk access (i.e. they can be sucked into memory from a disk within a small amount of time on failure).

Block

A block is a fixed-size memory allocation unit on a server, and much of a server's memory will be treated as an array of blocks. Blobs are stored in a set of blocks. While the blob need not start or end on block boundaries, it must fully utilize any blocks in the middle.

Steve: A shardlet is composed of multiple fixed sized blocks. Blocks are the main unit of memory allocation in the system. They store one or more objects or portions of objects and are mapped into shardlets by an explicit table structure (inode).

Inode

An inode stores an object. This includes the blob's checksum, its version, and all of its index entries. It also includes the list of blocks for the its blob, the start offset, and the length.

Steve: Each object is mapped to one or more blocks by an inode mapping. An object may start partway through the first block and end partway through a last block. In either case, space may be shared with another object. An inode contains metadata including an object checksum, version and index name, key tuples.

Shardlet

A shardlet is an array of constant size of blocks. A shardlet will likely have a maximum size that is convenient to write to disk, as a backup server will write a shardlet at a time to disk. A shardlet on a master server will be an array of pointers to blocks, while a shardlet on a backup server will probably be an array of blocks (directly).

Steve: Shards are broken up into shardlets, which are akin to LFS segments. Each shardlet is written to disk sequentially and sized such that they can be efficiently accessed in light of disk seek times.

Index

An index, made up of index entries, maps index keys to objects.

Steve: A lookup structure for arbitrary keys to object identifiers. Can be range-queryable.

Index Key

An index key is a short, application-controlled string used in range queries on indexes.

Steve: A typed index into an index structure (types may include string, int, float, etc).

Index Entries

An index entry is an (index key, object) pair in a given index.

Request Types

Get Request

The get request asks the server to return the object for a given table and primary key.

Overwrite Request

The overwrite request asks the server to replace the blob for a given table and primary key at a specific version number with a new blob.

Insert Request

The insert request asks the server to create a new object for a given table with the given blob and return the newly allocated primary key.

Put Request

The put request first runs the overwrite request. If no such object exists, it then runs the insert request. This happens atomically on the server.

Index Lookup Query

The index lookup query asks a server(s?) for objects matching a certain index key value in an index.

Range Query

...

backup

Anchor
backup
backup
Each server in the RAMCloud system serves two roles: it is master for some objects (it stores those objects in its DRAM and handles reads and writes for the object), and it also serves as backup for other objects. A backup server is responsible for storing information on disk as directed by masters, and retrieving that information from disk during crash recovery. Each object in RAMCloud is typically backed up on several machines; each master divides its data among many different backup machines, and each backup records information for many masters.

client machine

Anchor
clientMachine
clientMachine
A machine running one or more applications that use the RAMCloud storage system. The applications normally use a client library package to communicate with the RAMCloud servers. Client machines cannot necessarily be trusted by the RAMCloud servers, and the RAMCloud system does not depend on any particular behavior of client machines.

client library

Anchor
clientLibrary
clientLibrary
A collection of functions used by applications to access the RAMCloud storage system. The client library may include significant functionality that extends the base functions provided by the RAMCloud servers. For example, the client library will probably understand the contents of stored objects, whereas the servers treat the objects as opaque blobs. It is possible for there to be different client libraries that implement different abstractions on top of the base RAMCloud features; examples might be a memcached API, a full relational model, or a file system API.

crashed master
Anchor
crashedmaster
crashedmaster
A master that has failed and must be recovered.

coordinator

Anchor
coordinator
coordinator
A distinguished server that manages the other servers in the RAMCloud cluster. Some of the coordinator functions are:

  • The coordinator manages a list of all active servers in the cluster.
  • The coordinator keeps track of which servers contain which tablets; client machines retrieved this information to manage their own caches of configuration information.
  • The coordinator manages access control information, which it makes available to other servers in the cluster (this feature is not yet implemented).
  • The coordinator is responsible for deciding that a server has crashed and initiating recovery of that server.
  • The coordinator is responsible for moving data between servers in the cluster in order to balance load (this feature is not yet implemented).

index

Anchor
index
index
Indexes are only an idea and are not yet implemented. They will be used to provide efficient forms of lookup on data within tables. Each table may have any number of indexes associated with it; each index maps from keys in some form (strings, numbers, timestamps, etc.) to a set of objects within the table. Indexes support range lookups as well as exact matches.

key

Anchor
key
key
A variable-length byte string (up to 64 KB) that names an object within its table.

log

Anchor
log
log
Used by a master to hold object data. Each log is divided into an ordered list of segments. Logs are used in an append-only fashion: the contents of a segment are never modified once written. Each master manages one log, and different masters have different logs.

master

Anchor
master
master
Each object lives in the DRAM of a particular server, which has primary responsibility for managing the object. That server is called the master for the object. All reads and writes of an object must be directed to the master server for that object. Each server in the RAMCloud system is master for many different objects.

mini-transaction

Anchor
minitransaction
minitransaction
Minitransactions are only an idea and are not yet implemented. A collection of updates to one or more objects implemented atomically by RAMCloud. The current design is closely modeled on the Sinfonia system: a mini-transaction consists of one or more updates to objects, which will only be performed if one or more objects have specified version numbers. If the updates are performed, they happen atomically.

object

Anchor
object
object
The basic unit of data stored in the RAMCloud system. Each object is named with a key that uniquely identifies the object within its table. Objects are variable-length up to a limit of 1MB, and the RAMCloud servers do not interpret the contents of objects: they are just opqaue blobs of data. Each object has a 64-bit version number that increases monotonically whenever the object is modified.

recovery

Anchor
recovery
recovery
The period of time immediately after a server crash, during which that server's data is unavailable. During this stage of recovery backups read data from their disks into memory, and one or more recovery masters retrieve enough data from the backups to resume system operation.

recovery master
Anchor
recoverymaster
recoverymaster
A role that a master can take on during recovery; a recovery master takes over one or more tablets from a crashed master.

segment

Anchor
segment
segment
A fixed-size portion of a log (currently 8 MBytes). Segments are the unit of replication and backup: each segment exists in the memory of its master and is also replicated on one or more backups. Different segments within a log are typically backed up on different servers. The segment size is chosen so that full-segment writes to disk utilize 90% or more of the maximum disk bandwidth. The segment is also the unit of log cleaning: when most of the data in a segment has been deleted the master can copy the remaining live data to another segment and delete the old segment.

server

Anchor
server
server
One of the machines implementing the RAMCloud storage system. Server machines are "owned" by RAMCloud: they typically only execute trusted RAMCloud code. Server machines execute RPC requests coming from clients, and also communicate among each other to manage the RAMCloud system. Typically, each server acts as both a master and a backup.

service locator
Anchor
serviceLocator
serviceLocator
Provides information needed to communicate with a particular server, including the form of network transport to be used,  and additional parameters for that transport, such as a host name and port name. See Service Locators for details.

table

Anchor
table
table
Used to group related objects and to separate data from different applications. Objects are named using a table identifier and a key within the table. Access control information is based on tables, and indexes are associated with particular tables.

tablet

Anchor
tablet
tablet
A portion of a table, all of whose objects are stored on a single master. In the simplest case a table has a single tablet and all of the objects of the table live on a single master. If a table becomes too large to store on a single master then it is divided into multiple tablets that are assigned to different servers.

version number

Anchor
versionNumber
versionNumber
An integer value associated with each object, which is guaranteed to increase monotonically whenever the object is modified. Used to implement atomic operations on the object. Version numbers maintain their monotonic behavior even if an object is deleted and later re-created.

workspace

Anchor
workspace
workspace
Workspaces are only an idea and are not yet implemented. Used to hold all of the data for one or more applications. A workspace consists of any number of tables and indexes. A workspace is also the unit of access control: if an application has access to a workspace then it can read or write any information in that workspace.