Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Distribution

Effects

  • + Capacity
  • + Throughput
    • + Avoid hot spots
  • - Consistency
  • How do durability and reliability tie into the discussion?

Assumptions

Addresses

  • Structured, Unstructured
  • Random, Hashes, Sequential
  • User-specified, generated
  • Need at least 2^48 capacity for objects
    • Hence, unstructed addresses probably need to be at least 2^64

application (16 bit)

table (16 bit)

address (64 bit)

Approaches

Mapping

DHT

  • + Simple
  • + Natural replication
  • - Latency
    • Address to shard mapping has log(# shards) time in general
    • Can be mitgated for index space tradeoff using radix tree or tries
    • How many levels is too deep? Even 2-3 in the face of cache misses?
  • + Load sharing
  • - More difficult to co-locate related data on a single machine
    • Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)
  • Extensible
  • Linear
  • Consistent

B-Trees

  • + Supports range queries on totally ordered keys
  • +/- Allows several records from the same table to be returned with a single request
  • May cause a server to become a hot spot
    • Is this anymore true than with hashing?

Replication

Effects

  • + Throughput
    • + Mitgates hot spots
  • + Latency
    • Eliminates cross data center requests
  • - Consistency
  • Is replication needed for performance reasons?
    • If a single server can handle 1M requests/second, is there any need to replicate?
    • If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
    • If there is a hot spot, data reduction may not solve overloading problems
    • A system without replication would be much easier to manage
  • Perhaps replicas are needed to reduce latency (e.g., East Coast datacenter and West Coast datacenter)

Locality

Effects

  • + Network traffic
  • + Latency for serial requests
  • + Performance isolation in multi-tenant environments
  • + Economy of metadata
    • For example, only access control information for data which resides on a host must be replicated to that host
  • Is there any locality in interesting database applications?
  • The most interesting form of locality is locality within a request: would like to satisfy each request with a single call to a single server, if possible
  • No labels