Data stores reference for Indexing

Data stores reference for Indexing

Ankita's quick guide (still a WIP) to key features in data stores, with focus on indexing support. (If you find something to be inaccurate or know another data store that does well on most/all the parameters, let me know!)

Data Store (& Link)

Indexed

In-mem

Dist/Scalable data, index

Durable

Latency

Throughput

Consistency

API

Additional comments

Data Store (& Link)

Indexed

In-mem

Dist/Scalable data, index

Durable

Latency

Throughput

Consistency

API

Additional comments

RAMCloud

y

y

y, y

y [n]

?

?

Linearizable

keys - val



MySQL SE

y

n

y, ? (replication)

y





Transaction safe, ACID compliant

relational



MySQL Cluster

y

n

y, ? (replication & partitioning)

y







SQL and NoSQL



Cassandra

y

cache

y, y*

y





BASE (Basically Available Soft-state Eventual Consistency) + Can choose consistency level

partitioned rows (similar to sql) + denormalization + materialized views

*: each node indexes data it holds locally

MongoDB

y

cache



y







document oriented storage: JSON-style docs with dynamic schemas



H-Store

? (1)

y

y, ?









row-based-relational



VoltDB

? (1)

y

y, ?







ACID for transactions; unclear otherwise.

relational

Commercial H-store

G-Store

n

















LevelDB

n













key-val



Spanner

n











Externally-consistent





F1

y

n

y, ?

y

> mysql



Strong in gen, consistent global indexes

relational + sql

Uses Spanner; Google ads used MySQL. This made their db scalable.

PNUTS [paper]

kind-of*

n

y, -

y (2)





Relaxed

Basic relational

*: Optional secondary table lazily maintained; keyed on index key

DynamoDB

y

n (ssd)

y, ?

y (2)

single digit ms



Strong consistency on reads

tables, no fixed schemas. each item: diff num of attrs



BigTable

n











Strong

multi dimensional map which supports basic operations



Espresso

y

n

y, ?

y





Timeline-consistent

document oriented NoSQL; has secondary index

Uses MySQL/InnoDB as storage engine. Also uses Lucene+Databus+Helix

Postgres

? (1)

n

y, ?

y (2)





ACID compliant, MVCC

object-relational



COPS

n











Causal+

key-val



Eiger

n











Causal

column-store



Hyperdex [paper]

y

n

y, ?







Key ops are linearizable; Warp (commercial extension) has ACID transactions

key-val; rich datatypes



H-Base

n



y, -







Strictly consistent reads and writes

versioned, non-relational; has global and local indexes

Modeled after BigTable

Cloudera Impala

? (1)

n

y, ?









SQL interface

Massively parallel processing arch for perf w/ Hadoop Scalability

Redis

n

y

y, -

optional







key-val; keys can contain strings, hashes, lists, sets, sorted sets



CouchDB

y



y, ?

y





Eventually consistent

JSON



BigCouch

y















Commercial couchdb on steroids

Voldermort

n

cache

y

y







hash table



NuDB















relational?



Footnotes:

(1): Probably yes since they claim to be relational / sql.

(2): Probably yes since it is not in-memory.

Sources:

Websites/papers linked in the first column and official blog/wikis.