...
- Structured, Unstructured
- Random, Hashes, Sequential
- User-specified, generated
- Need at least 2^48 capacity for objects
- Hence, unstructured addresses probably need to be at least 2^64
- 64 * 2^30 bytes/machine * 2^14 machines = 2^50 bytes, 2^50 bytes/2^7 bytes/obj = 2^48 2^43 objects
Sequential and Structured
...
- Smaller ids (64-bit?)
- Not if we want these to look like capabilities
- Simple to make generation fast
- Not meaningful to client (both a plus and minus)
- Indexing must be done by clients and stored in the cloud
- Akin to FriendFeed's setup
- Content-based
- Can't share objects without references
- Less general
- Potential vulns if hashes have weaknesses (good 128-bit hashes?)
- Built-in de-duplication
- Which also poses storage channel for multi-tenacy
...
- How much metadata space is needed for all tables/applications?
- Object Level
- Up to
- 2^43 objects, size (4
- bytes?), permissions or appid, tableid if not in address (
- 4 bytes)
- (2^43)*
- 8 =
- 64 TB fully loaded (6.25% of capacity), not including the index size
- Object Level
- How does metadata replication occur and what is the frequency?
- On writes for object metadata
- Shard Mappings
- Lazily
- Not sufficient when a client discovers a host is down
- must update mappings in the new replicas at least very quickly
- May additionally want leases or heartbeat or something similar as in MapReduce to ensure enough copies of shards are maintained on failure even if the data is cold
Approaches
addr mod servers
...