Data model
Data Models
A list of data models used by various web-storage systems.
Block-based storage
Example: SANs LUNs - Linear array of fixed size blocks
Name space: (array#, LUN, block number, snapshot)
Operations: Read Block, Write Block
Blob Stores Example: Amazon's S3 Store blobs of data (0 to 5GB of size)
Name Space: (Bucket, Key)
Operations: Get/Put/Delete objects - Entire object update only Memcache
Data: Blocks identified by a key
Memcache Operations: (set, add, replace, append, prepend, cas, get, gets, delete, intr, decr)
Blobs with attributes store
Example: SimpleDB Blobs with attribute-value pairs
GET, PUT or DELETE items in your domain, along with the attribute-value pairs
Query on objects with various lexicographical queries
Big Table
Sparse Multidimensional sorted map
(row:string, column:string, time:int64) -> string
Column/Row database tradeoff
Row key unique - locality, ordered on it (row, column family, column qualifier, and timestamp) Different numbers of columns per rows. Hybrid column/row oriented storage (user-specified locality)
Document-oriented database
Scheme free. Example: CouchDB
JSON objects
Data types: All or nothing update of documents Views - JavaScript (Map of map reduce) Add structure back Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.
MongoDB(JSON DSON, with Indexs, Nested URL structure)
SQL data services Relational data model
Simplied relation data (SELECT - toss hard parts)
Object Oriented Database Structure: Arbitrary graph
Message queues
File Servers
Unix or Windows file system data models
Streaming Video Servers
Traditional relational model?
Relational databases tend to fragment data into lots of small pieces. For example, consider an order with order items; each order item will be a separate record in a table.
In a distributed system like RAMCloud, each fragment is likely to end up on a different server, resulting in lots of requests to collect an interesting amount of data.
The distribution also exacerbates consistency issues during updates.
Opaque variable-length blobs, like memcached?
Hierarchical hashes (JSON, Fiz datasets)?
In this model an entire order, including the main order and its items, would be a single object stored on a single server.
Does it make sense to support multiple tables, or is this a flat store that simply maps ids to objects?
Should RAMCloud be designed for small objects only? Any upper limit on size?