Page Comparison

...

+ Capacity
- Strictly necessary for this reason alone
+ Throughput/Latency
- + Avoid hot spots
- Multiple hosts can service requests in parallel
- Do either of these matter if nodes can handle 1M requests/sec?
- Throughput/Latency
- Cost to map address -> shard, shard -> host on client
- Cost to map address -> metadata (physical location, size, and permissions)
- Consistency
- Reliability

...

...

+ Simple
+ Natural replication
- Partition address space (needed for distribution anyway), multiple servers cover each partition
- Latency
- Address to shard mapping has log(# shards) time in general, if shards are not fixed width
- Can be mitgated for index space tradeoff using radix tree or tries
- How many levels is too deep?
- Even 2-3 in the face of cache misses? (what is cache miss cost)
  - Definitely, at 3.0 GHz we get 300 cycles per us
+ Load sharing
- More difficult to co-locate related data on a single machine
- Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

...

Versions Compared