Addressing

Some possibilities

Sequential and Structured

application (16 bit)

table (16 bit)

address (64 bit)

Random

Distribution

Effects

Evaluation

Questions

Approaches

addr mod servers

Mapping/Address Space Partitioning

RAMCloud Address -> Metadata as quickly as possible (e.g. with as few requests and as little processing as possible) where metadata includes physical address, size, and permissions at least.

Ideal: 0 network messages and O(1) address to host mapping time with high probability

Implies all clients are aware of mapping.

Possibly only update mapping on lookup failures. Requests to incorrect host reply with mapping correction. Except replicas must be notified quickly of failures.

Complication: access-control requires highly-consistent mapping replication if control is on addresses (e.g. the application/table is part of the structured address). Otherwise, missing entry causes denial, extra entry allows access after revocation.

Objects (likely as whole shards) may need to relocate due failures, load, or capacity.

Fixed Number of Objects per Shard

Assuming 64 bit addrs, 32 bit host addrs, 4 replicas (e.g. 128-bits stored per shard entry)

log_2 # shards

log_2 # addrs/shard

index size

30

34

16GB

40

24

16TB

DHT

Allows variable/fixed address space shards and/or variable/fixed space shards. Allows flexibility along both dimensions, for time and data structure code complexity. Could be hard in FPGA?

Example, create 1 node (shard) in the consistent hash map per GB in the system.
2^14 machines * 64 nodes = 2^20 nodes (shards)
Use 64 entry radix tree as ships with Linux for reverse address mappings and we have expected 3.3 accesses per lookup with index likely from 1 to 128 MB

Tried this with off-the-shelf radix tree that claims to be fast and cache optimized (Judy): with 2^20 nodes average lookup time of about 1000 cycles on an AMD64 and 520 cycles on a recent Core 2 Duo and a Xeon, allocated 15 MB of RAM during inserts. Even with some cache pressure (1000 lines touched randomly) still responds in about 1800 cycles.

B-Trees

Hard in FPGA?

Replication

Effects

Evaluation

Questions

Locality

Effects

Evaluation