Child pages
Skip to end of metadata
Go to start of metadata

User view for target applications:


Web Datastore

Analytics System

Transactional System


Semi-structured records, upto 2MB

Files, sets of records (often unstructured), maybe large records

Structured records, typically small


Simple, lookups, sometimes secondary key access

Rich analytic, procedural

Complex, declarative


Weak, per-record

Static files



~10s of milliseconds


query-dependent: 3 milliseconds to minutes


High, Elastic




Yahoo PNUTS, Google AppEngine Datastore, Facebook Cassandra, Amazon Dynamo

MapReduce, Hadoop, Aster, Terradata

MySQL, Oracle, IBM DB2, MS SQL Server

Web Datastore

More detailed user view
  • Queries
    • Primary key lookups, multigets
    • Secondary key lookups
    • Simple joins
    • Simple aggregations
    • Range queries
    • Often predeclared , can't be adhoc
  • Consistency, transcations
    • Per-record, ACID within record
    • Sometimes consistency dropped within record for certain failures for availability
    • Stale data served
    • Often expose trade offs for availability and performance
  • Flexible schemas
    • Semi-structured records
  • Self managing
  • Very high availability
  • Ordered tables, hash tables
  • Geographically low latency
  • Option of in-memory tables/records
Common techniques
  • Partitioning/Sharding
    • Often an extra layer of query router used
    • Load balancing is done by a manager (not in query path)
    • Coerced to same data-model
  • No locks across storage nodes
  • Replication
    • Replicas are used for reads and recovery
    • Asynchronously maintained
    • Maybe geographic, maybe selective
  • Record mastership
    • Only master writes
  • Record versioning
    • For consistency
  • Secondary indexing
    • Coerced to same data-model
    • Maybe as a replica
    • Sometimes co-location for joins
  • Logging/Messaging/PubSub
    • Used for logging
    • Used for asynchronous replication
  • Big caches on storage nodes

PNUTS deep-dive to show some of the above in use.

  • Max : ACID
  • Min : Eventual consistency
  • Commonly used : Per-record
    • Single timeline
    • Stale reads
    • Option of latest (for a cost)
    • Critical reads (self writes always visible)
    • Branching
      • System managed reconciliation
      • Application managed reconciliation
      • Needed for very high availability apps
      • Often for network partitions
      • For performance in some apps
  • Index consistency
Current directions
  • Richer queries
  • Richer consistency, transactions
  • Better self-management, even under failures
  • Lowering latency
    • Using Flash
Points of interest
  • Geographical replication
  • Can RAMCloud provide richer consistency along with performance and scale?
  • Virtual Worlds as an application can benefit because of low latency high request rate per server by allowing interaction at a large scale.
  • What queries to allow?
  • No labels