Data stores reference for Indexing

Data stores reference for Indexing

Ankita's quick guide (still a WIP) to key features in data stores, with focus on indexing support. (If you find something to be inaccurate or know another data store that does well on most/all the parameters, let me know!)

Data Store (& Link)

Indexed

In-mem

Dist/Scalable data, index

Durable

Latency

Throughput

Consistency

API

Additional comments

Data Store (& Link)

Indexed

In-mem

Dist/Scalable data, index

Durable

Latency

Throughput

Consistency

API

Additional comments

RAMCloud

y

y

y, y

y [n]

?

?

Linearizable

keys - val

 

MySQL SE

y

n

y, ? (replication)

y

 

 

Transaction safe, ACID compliant

relational

 

MySQL Cluster

y

n

y, ? (replication & partitioning)

y

 

 

 

SQL and NoSQL

 

Cassandra

y

cache

y, y*

y

 

 

BASE (Basically Available Soft-state Eventual Consistency) + Can choose consistency level

partitioned rows (similar to sql) + denormalization + materialized views

*: each node indexes data it holds locally

MongoDB

y

cache

 

y

 

 

 

document oriented storage: JSON-style docs with dynamic schemas

 

H-Store

? (1)

y

y, ?

 

 

 

 

row-based-relational

 

VoltDB

? (1)

y

y, ?

 

 

 

ACID for transactions; unclear otherwise.

relational

Commercial H-store

G-Store

n

 

 

 

 

 

 

 

 

LevelDB

n

 

 

 

 

 

 

key-val

 

Spanner

n

 

 

 

 

 

Externally-consistent

 

 

F1

y

n

y, ?

y

> mysql

 

Strong in gen, consistent global indexes

relational + sql

Uses Spanner; Google ads used MySQL. This made their db scalable.

PNUTS [paper]

kind-of*

n

y, -

y (2)

 

 

Relaxed

Basic relational

*: Optional secondary table lazily maintained; keyed on index key

DynamoDB

y

n (ssd)

y, ?

y (2)

single digit ms

 

Strong consistency on reads

tables, no fixed schemas. each item: diff num of attrs

 

BigTable

n

 

 

 

 

 

Strong

multi dimensional map which supports basic operations

 

Espresso

y

n

y, ?

y

 

 

Timeline-consistent

document oriented NoSQL; has secondary index

Uses MySQL/InnoDB as storage engine. Also uses Lucene+Databus+Helix

Postgres

? (1)

n

y, ?

y (2)

 

 

ACID compliant, MVCC

object-relational

 

COPS

n

 

 

 

 

 

Causal+

key-val

 

Eiger

n

 

 

 

 

 

Causal

column-store

 

Hyperdex [paper]

y

n

y, ?

 

 

 

Key ops are linearizable; Warp (commercial extension) has ACID transactions

key-val; rich datatypes

 

H-Base

n

 

y, -

 

 

 

Strictly consistent reads and writes

versioned, non-relational; has global and local indexes

Modeled after BigTable

Cloudera Impala

? (1)

n

y, ?

 

 

 

 

SQL interface

Massively parallel processing arch for perf w/ Hadoop Scalability

Redis

n

y

y, -

optional

 

 

 

key-val; keys can contain strings, hashes, lists, sets, sorted sets

 

CouchDB

y

 

y, ?

y

 

 

Eventually consistent

JSON

 

BigCouch

y

 

 

 

 

 

 

 

Commercial couchdb on steroids

Voldermort

n

cache

y

y

 

 

 

hash table

 

NuDB

 

 

 

 

 

 

 

relational?

 

Footnotes:

(1): Probably yes since they claim to be relational / sql.

(2): Probably yes since it is not in-memory.

Sources:

Websites/papers linked in the first column and official blog/wikis.