Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • + Simple
  • + Natural replication
    • Partition address space (needed for distribution anyway), multiple servers cover each partition
  • - Latency
    • Address to shard mapping has log(# shards) time in general, if shards are not fixed width
    • Can be mitgated for index space tradeoff using radix tree or tries
    • How many levels is too deep? Even 2-3 in the face of cache misses?
  • + Load sharing
  • - More difficult to co-locate related data on a single machine
    • Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

...

  • + Supports range queries on totally ordered keys
  • +/- Allows several records from the same table to be returned with a single request
  • May cause a server to become a hot spot
    • Is this anymore true than with hashing?

Replication

Effects

  • - Capacity
  • + Throughput
    • + Mitgates hot spots
    • If there is a hot spot, data reduction may not solve overloading problems
    • If a single server can handle 1M requests/second, is there any need to replicate?
    • If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
  • + Latency
    • Eliminates cross data center requests (East to West Coast datacenters)
  • - Consistency
    • A system without replication would be much easier to manage

...