Distribution of data among servers, replication, locality

Addressing

Some possibilities

Structured, Unstructured
Random, Hashes, Sequential
User-specified, generated
Need at least 2^48 capacity for objects
- Hence, unstructured addresses probably need to be at least 2^64
- 64 * 2^30 bytes/machine * 2^14 machines = 2^50 bytes, 2^50 bytes/2^7 bytes/obj = 2^48 objects

Sequential and Structured

Temporal id locality
Allocation could be tricky to make fast
How many tables does a typical (or large) RDBMS have?
How many applications do we expect to support in a single RAMCloud instance?
How much metadata space is need for all tables/applications?
How does metadata replication occur and what is the frequency?

application (16 bit)

table (16 bit)

address (64 bit)

Random

Smaller ids (64-bit?)
- Not if we want these to look like capabilities
Simple to make fast
Not meaningful to client (both a plus and minus)
Indexing must be done by clients and stored in the cloud
- Akin to FriendFeed's setup

Distribution

Effects

+ Capacity
- Strictly necessary for this reason alone
+ Throughput
- + Avoid hot spots
- Do either of these matter if nodes can handle 1M requests/sec?
- Consistency

How do durability and reliability tie in to the discussion?

Approaches

Mapping/Address Space Paritioning

Tradeoffs: Capacity, Throughput (via parallel requests) vs Latency (lookup time client->host, host->addr+metadata)

RAMCloud Address -> Metadata as quickly as possible (e.g. with as few requests and as little processing as possible) where metadata includes physical address, size, and permissions at least.

Ideal: 0 network messages and O(1) address to host mapping time with high probability

Implies all clients are aware of mapping.
Complication: access-control requires highly-consistent mapping replication if control is on addresses (e.g. the application/table is part of the structured address).

Objects may need to relocate due failures, load, or capacity.

Fixed Number of Objects per Shard

+ O(1) lookup
- Just keep a table, shift the address to get the shard number, then lookup host by shard id
- Size of a shard can never grow beyond capacity of the largest machine
- Might not always be able to store even when there is excess capacity in the system
- Could do something hackish; have saturated host forward requests it doesn't have the entries for to another host with a different part of the same shard
- Nearly impossible to determine address range chunk size initially
- Nightmare if we decide we need a new address range chunk size
- Requires "rehashing" the whole thing

Some of these issues are addressable with linear hashing techniques, but it will erode the O(1) time.

DHT

+ Simple
+ Natural replication
- Partition address space (needed for distribution anyway), multiple servers cover each partition
- Latency
- Address to shard mapping has log(# shards) time in general, if shards are not fixed width
- Can be mitgated for index space tradeoff using radix tree or tries
- How many levels is too deep?
- Even 2-3 in the face of cache misses? (what is cache miss cost)
  - Definitely, at 3.0 GHz we get 300 cycles per us
+ Load sharing
- More difficult to co-locate related data on a single machine
- Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)

Extendible
Linear
Consistent

B-Trees

+ Supports range queries on totally ordered keys
+/- Allows several records from the same table to be returned with a single request
May cause a server to become a hot spot
- Is this anymore true than with hashing?
- Latency (at least as bad as DHT)

Replication

Effects

- Capacity
+ Throughput
- + Mitgates hot spots
- If there is a hot spot, data reduction may not solve overloading problems
- If a single server can handle 1M requests/second, is there any need to replicate?
- If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
+ Latency
- Eliminates cross data center requests (East to West Coast datacenters)
- Consistency
- A system without replication would be much easier to manage

Locality

Effects

+ Network traffic
+ Latency for serial requests
+ Performance isolation in multi-tenant environments
+ Economy of metadata
- For example, only access control information for data which resides on a host must be replicated to that host

Is there any locality in interesting database applications?
The most interesting form of locality is locality within a request: would like to satisfy each request with a single call to a single server, if possible