...
- + Simple
- + Natural replication
- Partition address space (needed for distribution anyway), multiple servers cover each partition
- - Latency
- Address to shard mapping has log(# shards) time in general, if shards are not fixed width
- Can be mitgated for index space tradeoff using radix tree or tries
- How many levels is too deep? Even 2-3 in the face of cache misses?
- + Load sharing
- - More difficult to co-locate related data on a single machine
- Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)
...
- + Supports range queries on totally ordered keys
- +/- Allows several records from the same table to be returned with a single request
- May cause a server to become a hot spot
- Is this anymore true than with hashing?
Replication
Effects
- - Capacity
- + Throughput
- + Mitgates hot spots
- If there is a hot spot, data reduction may not solve overloading problems
- If a single server can handle 1M requests/second, is there any need to replicate?
- If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
- + Latency
- Eliminates cross data center requests (East to West Coast datacenters)
- - Consistency
- A system without replication would be much easier to manage
...