...
- Must manage growth automatically
- ideally, just plug in more servers and the system automatically remodels itself to handle the additional capacity
- at large scale, individual management is impossible => need LOM
- at small scale, management probably feasible => assuming additional complexity for small setups is ok
- primarily, must automate network autoconfiguration (easy)
- what about autoconfiguring location?
- system will probably want to know what lives where (which machines share same rack, etc, for availability)
- probably can be derived from subnet?
- what about autoconfiguring location?
- Scale affects:
- Network architecture
- Distribution/Replication/Reliability
- Fundamental algorithms - how well do they scale? Examples:
- Dynamo (DHT)
- paper doesn't say (probably 1000's, though each app has own Dynamo instance)
- DHT's generally scale well and deal well with churn
- i.e. popular with P2P apps in general
- uses consistent hashing
- various schemes tested
- DHT's generally scale well and deal well with churn
- focus is on availability and latency guarantees (milliseconds)
- paper doesn't say (probably 1000's, though each app has own Dynamo instance)
- BigTable (B+ like tree atop bunch of other distributed services)
- paper indicates generally < 500 servers in production
- diminishing returns due to GFS large block sizes
- interestingly, BigTable can run in all-RAM mode and is not decentralised
- What we choose depends largely on data model
- e.g. DHT good if we don't want range queries/care about key locality, else a tree may be better
- Dynamo (DHT)
- Fundamental algorithms - how well do they scale? Examples:
- ideally, just plug in more servers and the system automatically remodels itself to handle the additional capacity
- Must have good system visibility/insight
- For health/system status/load/etc (managerial)
- For development/debugging/performance optimisation/etc (for us. Want an X-Trace/DTrade for RAMCloud, perhaps?)
Scaling Down
The system should scale down as well as up:
...