...
- Static Scalability
- New installations can be created of many sizes: 1 machine, 10k machines, etc.
- Dynamic Scalability
- Existing installations must permit expansion - both incremental and explosive
- Need to scale up as quickly as user requires - may be orders of magnitude in a few days
- (Orran Krieger's Forum presentation - EC2 customer example)
- Scaling down may be as important as scaling up
- server consolidation may be important
- regular: reduce active nodes during off-peak times (assuming we can maintain the in-memory dataset)
- irregular: data center resources may be re-provisioned (to cut costs, handle reduced popularity, RAMCloud 2.0 is just too efficient, etc)
- server consolidation may be important
- Need to scale up as quickly as user requires - may be orders of magnitude in a few days
- Existing installations must permit expansion - both incremental and explosive
Addressing
- 10,000 128GB nodes = 1.250 PB of storage
- 1 PB = 2^50 bytes
- Assuming average object size is 128 bytes:
- 2^50 / 2^7 = 2^43 objects
- => need at least 43 bits of address space (rounds up to 64-bit)
- May want much larger key spaces, though:
- if random keys, want less probability of collision, aid to distribution, etc
- if structured keys, need bits for user, table, access rights, future/undefined fields, etc
- So, we probably want 128-bit addressing, at the minimum.
Heterogeneity
- Machines will be used at least long enough such that new additions will be faster, have more memory, or otherwise differ in important characteristics
- Need to be sure heterogeneity is taken into account in distribution
- e.g.: using DHT's and consistent hashing we can varying the key space a node is responsible for to be in proportion to its capabilities
- Is purposeful heterogeneity useful?
- Lots of big, dumb, efficient nodes with lots of RAM coupled with some smaller, very fast, expensive nodes for offloading serious hotspots?
- Or will high throughput/low latencies save us?
- Lots of big, dumb, efficient nodes with lots of RAM coupled with some smaller, very fast, expensive nodes for offloading serious hotspots?
Sharding
* Likely want to shard RAMCloud into chunks
Virtualisation Interplay
- Is it reasonable to expect to run within a virtualised environment?
- could imply much greater dynamism than we might be anticipating
- high churn in joining/leaving DHT, lots of resultant swap in/out to maintain availability
- could also imply larger number of nodes than we expect, e.g.
- let a hypervisor worry about multiprocessors
- could imply much greater dynamism than we might be anticipating
- VMs may have significant latency penalties (though can be mitigated with PCI device pass-through, core pinning, etc)