...
- Extendible
- Still expensive "rehash" time while the index is doubled in size; just consists of making any entire copy of the index
- Linear
- Not great unless shards are split prematurely since the original algorithm still requires bucket chaining
- Even so, doesn't provide load relief where it's needed, but rather on nodes in a predefined order
- Consistent
- log(# shards) lookup time
- can be mitigated by fixing the address space chunk size
- opens up the same can of worms balancing enough keys per shard to keep index down while staying inside the cap of a single machineIn reality, high-radix radix trees can solve this
Example, create 1 node (shard) in the consistent hash map per GB in the system.
2^14 machines * 64 nodes = 2^20 nodes (shards)
Use 64 entry radix tree as ships with Linux for reverse address mappings and we have expected 3.3 accesses per lookup with index likely from 1 to 128 MB
B-Trees
- + Supports range queries on totally ordered keys
- +/- Allows several records from the same table to be returned with a single request
- Can give control over placement if they can take advantage of it for locality
- May cause a server to become a hot spot
- Is this anymore true than with hashing?
- - Latency (at least as bad as DHT)
...