Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • + Capacity
    • Strictly necessary for this reason alone
  • + Throughput/Latency
    • + Avoid hot spots
    • Multiple hosts can service requests in parallel
    • Do either of these matter if nodes can handle 1M requests/sec?
  • - Throughput/Latency
    • Cost to map address -> shard, shard -> host on client
    • Cost to map address -> metadata (physical location, size, and permissions)
  • - Consistency
  • - Reliability
  • How do durability and reliability tie in to the discussion?

...

Mapping/Address Space Paritioning

...

RAMCloud Address -> Metadata as quickly as possible (e.g. with as few requests and as little processing as possible) where metadata includes physical address, size, and permissions at least.

...

Implies all clients are aware of mapping.

Possibly only update mapping on lookup failures. Requests to incorrect host reply with mapping correction.

Complication: access-control requires highly-consistent mapping replication if control is on addresses (e.g. the application/table is part of the structured address).

Objects (likely as whole shards) may need to relocate due failures, load, or capacity.

...

Some of these issues are addressable with linear hashing techniques, but it will erode the O(1) time.

DHT

  • + Simple
  • + Natural replication
    • Partition address space (needed for distribution anyway), multiple servers cover each partition
  • - Latency
    • Address to shard mapping has log(# shards) time in general, if shards are not fixed width
    • Can be mitgated for index space tradeoff using radix tree or tries
    • How many levels is too deep?
    • Even 2-3 in the face of cache misses? (what is cache miss cost)
      • Definitely, at 3.0 GHz we get 300 cycles per us
  • + Load sharing
  • - More difficult to co-locate related data on a single machine
    • Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

...