...
- Machines will be used at least long enough such that new additions will be faster, have more memory, or otherwise differ in important characteristics
- Need to be sure heterogeneity is taken into account in distribution
- e.g.: using DHT's and consistent hashing we can varying the key space a node is responsible for to be in proportion to its capabilities
- Is purposeful heterogeneity useful?
- Lots of big, dumb, efficient nodes with lots of RAM coupled with some smaller, very fast, expensive nodes for offloading serious hotspots?
- Or will high throughput/low latencies save us?
- Alternative is to shrink responsibility of overloaded node to concentrate on hottest data
- Perhaps there's a performance, energy, or cost win to being specifically heterogeneous?
- Lots of big, dumb, efficient nodes with lots of RAM coupled with some smaller, very fast, expensive nodes for offloading serious hotspots?
Sharding
...
- Likely want to shard RAMCloud into
...
- chunks
- Useful for heterogeneity - chunk size <= smallest memory capacity, bigger servers responsible for more chunks
- Variable vs. static chunk sizes
- Variable complicates mapping addresses to chunks, but:
- permits squeezing an address range to break apart hot data
- hot spots may grow increasingly unlikely with scale, but should we consider the low end?
- Variable complicates mapping addresses to chunks, but:
Virtualisation Interplay
- Is it reasonable to expect to run within a virtualised environment?
- could imply much greater dynamism than we might be anticipating
- high churn in joining/leaving DHT, lots of resultant swap in/out to maintain availability
- could also imply larger number of nodes than we expect, e.g.
- let a hypervisor worry about multiprocessors
- could imply much greater dynamism than we might be anticipating
- VMs may have significant latency penalties (though can be mitigated with PCI device pass-through, core pinning, etc)