Content Comparison

...

+ Simple
+ Natural replication
- Shard address space (needed for distribution anyway), multiple servers cover each shard
- Latency
- Address to shard mapping has log(# shards) time in general, if shards are not fixed width in terms of # of addresses
- Can be mitgated for index space tradeoff using radix tree or tries
- How many levels is too deep?
- Even in the face of cache misses? (what is cache miss cost)
  - At 3.0 GHz we get 3000 cycles per us
  - Cache miss seem to be about 60 cycles = 30 ns?
+ Load sharing
- More difficult to co-locate related data on a single machine
- Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

Extendible
- Still expensive "rehash" time while the index is doubled in size; just consists of making any entire copy of the index
Linear
- Not great unless shards are split prematurely since the original algorithm still requires bucket chaining
- Even so, doesn't provide load relief where it's needed, but rather on nodes in a predefined order
Consistent
- log(# shards) lookup time
- can be mitigated by fixing the address space chunk size
- opens up the same can of worms
  - balancing enough keys per shard to keep index down while staying inside the cap of a single machine

+ Supports range queries on totally ordered keys
+/- Allows several records from the same table to be returned with a single request
- Can give control over placement if they can take advantage of it for locality
May cause a server to become a hot spot
- Is this anymore true than with hashing?
- Latency (at least as bad as DHT)

...

Versions Compared