Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • + Balanced storage
    • Statistically, if keys are random
    • Need other protections if keys are not random (e.g. user can control placement)
  • - Requires exact same amount of storage in all machines statistically
  • - Adding/removing machines
    • Requires all objects to move, or for cluster to only double in size each time

...

Possibly only update mapping on lookup failures. Requests to incorrect host reply with mapping correction. Except replicas must be notified quickly of failures.

Complication: access-control requires highly-consistent mapping replication if control is on addresses (e.g. the application/table is part of the structured address). Otherwise, missing entry causes denial, extra entry allows access after revocation.

Objects (likely as whole shards) may need to relocate due failures, load, or capacity.

...

Assuming 64 bit addrs, 32 bit host addrs, 4 replicas (e.g. 128-bits stored per shard entry)

log_2 # shards

log_2 # addrs/shard

index size

30

34

16GB

40

24

16TB

...

Allows variable/fixed address space shards and/or variable/fixed space shards. Allows flexibility along both dimensions, for time and data structure code complexity. Could be hard in FPGA?

  • + Simple
  • + Natural replication
    • Shard address space (needed for distribution anyway), multiple servers cover each shard
  • - Latency
    • Address to shard mapping has log(# shards) time in general, if shards are not fixed width in terms of # of addresses
    • Can be mitgated for index space tradeoff using radix tree or tries
    • How many levels is too deep?
    • Even in the face of cache misses? (what is cache miss cost)
      • At 3.0 GHz we get 3000 cycles per us
      • Cache miss seem to be about 60 cycles = 30 ns?
  • + Load sharing
  • - More difficult to co-locate related data on a single machine
    • Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

...

  • + Supports range queries on totally ordered keys
  • +/- Allows several records from the same table to be returned with a single request
    • Can give control over placement if they can take advantage of it for locality
  • May cause a server to become a hot spot
    • Is this anymore true than with hashing?
  • - Latency (at least as bad as DHT)

Hard in FPGA?

Replication

Effects

  • - Capacity
  • + Throughput
    • + Mitgates hot spots
    • If there is a hot spot, data reduction may not solve overloading problems
    • If a single server can handle 1M requests/second, is there any need to replicate?
    • If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
  • + Latency
    • Eliminates cross data center requests (East to West Coast datacenters)
  • - Consistency
    • A system without replication would be much easier to manage

...