Distribution
Effects
- + Capacity
- + Throughput
- + Avoid hot spots
- - Consistency
- How do durability and reliability tie into the discussion?
Addresses
- Structured, Unstructured
- Random, Hashes, Sequential
- User-specified, generated
- Need at least 2^48 capacity for objects
- Hence, unstructed addresses probably need to be at least 2^64
application (16 bit) |
table (16 bit) |
address (64 bit) |
Approaches
Mapping
B-Trees
- Supports range queries on totally ordered keys
- Allows several records from the same table to be returned with a single request
- May cause a server to become a hot spot
- RP*
Hashing
-
- Simple
- Likely to spread the load better
- If a single request needs multiple records from a table, it's likely to require separate requests to multiple servers, which adds overhead
- Extensible
- Linear
- Consistent
Replication
Effects
- + Throughput
- + Mitgates hot spots
- + Latency
- Eliminates cross data center requests
- - Consistency
- Is replication needed for performance reasons?
- If a single server can handle 1M requests/second, is there any need to replicate?
- If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
- If there is a hot spot, data reduction may not solve overloading problems
- A system without replication would be much easier to manage
- Perhaps replicas are needed to reduce latency (e.g., East Coast datacenter and West Coast datacenter)
Locality
Effects
- + Network traffic
- + Latency for serial requests
- + Performance isolation in multi-tenant environments
- + Economy of metadata
- For example, only access control information for data which resides on a host must be replicated to that host
- Is there any locality in interesting database applications?
- The most interesting form of locality is locality within a request: would like to satisfy each request with a single call to a single server, if possible