PNUTS
User view for target applications:
Property |
Web Datastore |
Analytics System |
Transactional System |
---|---|---|---|
Data |
Semi-structured records, upto 2MB |
Files, sets of records (often unstructured), maybe large records |
Structured records, typically small |
Queries |
Simple, lookups, sometimes secondary key access |
Rich analytic, procedural |
Complex, declarative |
Consistency |
Weak, per-record |
Static files |
ACID |
Latency |
~10s of milliseconds |
minutes/hours |
query-dependent: 3 milliseconds to minutes |
Scalability |
High, Elastic |
High |
Low |
Examples |
Yahoo PNUTS, Google AppEngine Datastore, Facebook Cassandra, Amazon Dynamo |
MapReduce, Hadoop, Aster, Terradata |
MySQL, Oracle, IBM DB2, MS SQL Server |
Web Datastore
More detailed user view
- Queries
- Primary key lookups, multigets
- Secondary key lookups
- Simple joins
- Simple aggregations
- Range queries
- Often predeclared , can't be adhoc
- Consistency, transcations
- Per-record, ACID within record
- Sometimes consistency dropped within record for certain failures for availability
- Stale data served
- Often expose trade offs for availability and performance
- Flexible schemas
- Semi-structured records
- Self managing
- Very high availability
- Ordered tables, hash tables
- Geographically low latency
- Option of in-memory tables/records
Common techniques
- Partitioning/Sharding
- Often an extra layer of query router used
- Load balancing is done by a manager (not in query path)
- Coerced to same data-model
- No locks across storage nodes
- Replication
- Replicas are used for reads and recovery
- Asynchronously maintained
- Maybe geographic, maybe selective
- Record mastership
- Only master writes
- Record versioning
- For consistency
- Secondary indexing
- Coerced to same data-model
- Maybe as a replica
- Sometimes co-location for joins
- Logging/Messaging/PubSub
- Used for logging
- Used for asynchronous replication
- Big caches on storage nodes
PNUTS deep-dive to show some of the above in use.
Consistency
- Max : ACID
- Min : Eventual consistency
- Commonly used : Per-record
- Single timeline
- Stale reads
- Option of latest (for a cost)
- Critical reads (self writes always visible)
- Branching
- System managed reconciliation
- Application managed reconciliation
- Needed for very high availability apps
- Often for network partitions
- For performance in some apps
- Index consistency
Current directions
- Richer queries
- Richer consistency, transactions
- Better self-management, even under failures
- Lowering latency
- Using Flash
Points of interest
- Geographical replication
- Can RAMCloud provide richer consistency along with performance and scale?
- Virtual Worlds as an application can benefit because of low latency high request rate per server by allowing interaction at a large scale.
- What queries to allow?