Data model

Data Models

A list of data models used by various web-storage systems.

Block-based storage

Example: SANs LUNs - Linear array of fixed size blocks

Name space: (array#, LUN, block number, snapshot)

Operations:  Read Block, Write Block

Blob Stores Example: Amazon's S3 Store blobs of data (0 to 5GB of size)

Name Space: (Bucket,  Key)

Operations: Get/Put/Delete objects - Entire object update only Memcache 

    Data: Blocks identified by a key

   Memcache Operations: (set, add, replace, append, prepend, cas, get, gets, delete, intr, decr)

Amazon S3

Memcache

Blobs with attributes store

Example: SimpleDB  Blobs with attribute-value pairs

GET, PUT or DELETE items in your domain, along with the attribute-value pairs

Query on objects with various lexicographical queries

Amazon SimpleDB

Big Table

Sparse Multidimensional sorted map

(row:string, column:string, time:int64) -> string

Column/Row database tradeoff

Row key unique - locality, ordered on it  (row, column family, column qualifier, and timestamp) Different numbers of columns per rows.  Hybrid column/row oriented storage (user-specified locality)

Big Table Paper

Document-oriented database

Scheme free.  Example: CouchDB

JSON objects

Data types: All or nothing update of documents  Views  - JavaScript  (Map of map reduce)  Add structure back  Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language. 

MongoDB(JSON DSON, with Indexs, Nested URL structure)  

SQL data services  Relational data model

Simplied relation data (SELECT - toss hard parts)  

Object Oriented Database Structure: Arbitrary graph  

Message queues

Amazon Simple Queue Service

File Servers

Unix or Windows file system data models  

Streaming Video Servers 

Traditional relational model?

    • Relational databases tend to fragment data into lots of small pieces. For example, consider an order with order items; each order item will be a separate record in a table.
    • In a distributed system like RAMCloud, each fragment is likely to end up on a different server, resulting in lots of requests to collect an interesting amount of data.
    • The distribution also exacerbates consistency issues during updates.
  • Opaque variable-length blobs, like memcached?
  • Hierarchical hashes (JSON, Fiz datasets)?
    • In this model an entire order, including the main order and its items, would be a single object stored on a single server.
  • Does it make sense to support multiple tables, or is this a flat store that simply maps ids to objects?
  • Should RAMCloud be designed for small objects only? Any upper limit on size?