PNUTS
User view for target applications:
Property | Web Datastore | Analytics System | Transactional System |
|---|---|---|---|
Data | Semi-structured records, upto 2MB | Files, sets of records (often unstructured), maybe large records | Structured records, typically small |
Queries | Simple, lookups, sometimes secondary key access | Rich analytic, procedural | Complex, declarative |
Consistency | Weak, per-record | Static files | ACID |
Latency | ~10s of milliseconds | minutes/hours | query-dependent: 3 milliseconds to minutes |
Scalability | High, Elastic | High | Low |
Examples | Yahoo PNUTS, Google AppEngine Datastore, Facebook Cassandra, Amazon Dynamo | MapReduce, Hadoop, Aster, Terradata | MySQL, Oracle, IBM DB2, MS SQL Server |
Web Datastore
More detailed user view
Queries
Primary key lookups, multigets
Secondary key lookups
Simple joins
Simple aggregations
Range queries
Often predeclared , can't be adhoc
Consistency, transcations
Per-record, ACID within record
Sometimes consistency dropped within record for certain failures for availability
Stale data served
Often expose trade offs for availability and performance
Flexible schemas
Semi-structured records
Self managing
Very high availability
Ordered tables, hash tables
Geographically low latency
Option of in-memory tables/records
Common techniques
Partitioning/Sharding
Often an extra layer of query router used
Load balancing is done by a manager (not in query path)
Coerced to same data-model
No locks across storage nodes
Replication
Replicas are used for reads and recovery
Asynchronously maintained
Maybe geographic, maybe selective
Record mastership
Only master writes
Record versioning
For consistency
Secondary indexing
Coerced to same data-model
Maybe as a replica
Sometimes co-location for joins
Logging/Messaging/PubSub
Used for logging
Used for asynchronous replication
Big caches on storage nodes
PNUTS deep-dive to show some of the above in use.
Consistency
Max : ACID
Min : Eventual consistency
Commonly used : Per-record
Single timeline
Stale reads
Option of latest (for a cost)
Critical reads (self writes always visible)
Branching
System managed reconciliation
Application managed reconciliation
Needed for very high availability apps
Often for network partitions
For performance in some apps
Index consistency
Current directions
Richer queries
Richer consistency, transactions
Better self-management, even under failures
Lowering latency
Using Flash
Points of interest
Geographical replication
Can RAMCloud provide richer consistency along with performance and scale?
Virtual Worlds as an application can benefit because of low latency high request rate per server by allowing interaction at a large scale.
What queries to allow?