Distribution
Effects
- + Capacity
- + Throughput
- + Avoid hot spots
- - Consistency
- How do durability and reliability tie into the discussion?
Addresses
- Structured, Unstructured
- Random, Hashes, Sequential
- User-specified, generated
- Need at least 2^48 capacity for objects
- Hence, unstructed addresses probably need to be at least 2^64
application (16 bit) | table (16 bit) | address (64 bit) |
Approaches
Mapping
B-
...
Trees
- Supports range queries on totally ordered keys
- Allows several records from the same table to be returned with a single request
- May cause a server to become a hot spot
- RP*
Hashing
- Simple
- Likely to spread the load better
- If a single request needs multiple records from a table, it's likely to require separate requests to multiple servers, which adds overhead
B-Trees
- RP*
Hashing
- Extensible
- Linear
- Consistent
Replication
Effects
- + Throughput
- + Mitgates hot spots
- + Latency
- Eliminates cross data center requests
- - Consistency
- Is replication needed for performance reasons?
- If a single server can handle 1M requests/second, is there any need to replicate?
- If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
- If there is a hot spot, data reduction may not solve overloading problems
- A system without replication would be much easier to manage
- Perhaps replicas are needed to reduce latency (e.g., East Coast datacenter and West Coast datacenter)
Locality
Effects
- + Network traffic
- + Latency for serial requests
- + Performance isolation in multi-tenant environments
- + Economy of metadata
- For example, only access control information for data which resides on a host must be replicated to that host
...