Addressing
Some possibilities
- Structured, Unstructured
- Random, Hashes, Sequential
- User-specified, generated
- Need at least 2^48 capacity for objects
- Hence, unstructured addresses probably need to be at least 2^64
Sequential and Structured
- Temporal id locality
- Allocation could be tricky to make fast
- How many tables does a typical (or large) RDBMS have?
- How many applications do we expect to support in a single RAMCloud instance?
- How much metadata space is need for all tables/applications?
- How does metadata replication occur and what is the frequency?
application (16 bit) |
table (16 bit) |
address (64 bit) |
Random
- Smaller ids (64-bit?)
- Simple to make fast
- Not if we want these to look like capabilities
- Not meaningful to client (both a plus and minus)
- Indexing must be done by clients and stored in the cloud
- Akin to FriendFeed's setup
Distribution
Effects
- + Capacity
- + Throughput
- + Avoid hot spots
- - Consistency
- How do durability and reliability tie into the discussion?
Assumptions
Addresses
- Structured, Unstructured
- Random, Hashes, Sequential
- User-specified, generated
- Need at least 2^48 capacity for objects
- Hence, unstructed addresses probably need to be at least 2^64
application (16 bit) |
table (16 bit) |
address (64 bit) |
Sequential and Structured
- Temporal id locality
- Allocation could be tricky to make fast
- How many tables does a typical (or large) RDBMS have?
- How many applications do we expect to support in a single RAMCloud instance?
- How much metadata space is need for all tables/applications?
- How does metadata replication occur and what is the frequency?
Random
- Smaller ids (64-bit?)
- Simple to make fast
- Not if we want these to look like capabilities
- Not meaningful to client (both a plus and minus)
- Indexing must be done by clients and stored in the cloud
- Akin to FriendFeed's setup
Approaches
Mapping
Tradeoff: Capacity, Throughput (via parallel requests) vs Latency (lookup time)
RAMCloud Address -> Physical storage location as quickly as possible (e.g. with as few requests and as little processing as possible).
Ideal: 0 network messages and O(1) address to host mapping time with high probability
Implies all clients are aware of mapping.
Complication: access-control requires highly-consistent mapping replication if control is on addresses (e.g. the application/table is part of the structured address).
Objects may need to relocate due failures, load, or capacity.
DHT
- + Simple
- + Natural replication
- - Latency
- Address to shard mapping has log(# shards) time in general
- Can be mitgated for index space tradeoff using radix tree or tries
- How many levels is too deep? Even 2-3 in the face of cache misses?
- + Load sharing
- - More difficult to co-locate related data on a single machine
- Probably the case that we want to intentionally distrbute related data (more network overhead, but reduces latency because lookups happen on independent machines)
- Extensible
- Linear
- Consistent
B-Trees
- + Supports range queries on totally ordered keys
- +/- Allows several records from the same table to be returned with a single request
- May cause a server to become a hot spot
- Is this anymore true than with hashing?
Replication
Effects
- + Throughput
- + Mitgates hot spots
- + Latency
- Eliminates cross data center requests
- - Consistency
- Is replication needed for performance reasons?
- If a single server can handle 1M requests/second, is there any need to replicate?
- If the load gets too high, perhaps reduce the load by reducing the amount of data stored on a server, rather than replicating the data
- If there is a hot spot, data reduction may not solve overloading problems
- A system without replication would be much easier to manage
- Perhaps replicas are needed to reduce latency (e.g., East Coast datacenter and West Coast datacenter)
Locality
Effects
- + Network traffic
- + Latency for serial requests
- + Performance isolation in multi-tenant environments
- + Economy of metadata
- For example, only access control information for data which resides on a host must be replicated to that host
- Is there any locality in interesting database applications?
- The most interesting form of locality is locality within a request: would like to satisfy each request with a single call to a single server, if possible