Page Comparison

...

Must manage growth automatically
- ideally, just plug in more servers and the system automatically remodels itself to handle the additional capacity
  - at large scale, individual management is impossible => need LOM
  - at small scale, management probably feasible => assuming additional complexity for small setups is ok
- primarily, must automate network autoconfiguration (easy)
  - what about autoconfiguring location?
    - system will probably want to know what lives where (which machines share same rack, etc, for availability)
    - probably can be derived from subnet?
- Scale affects:
  - Network architecture
  - Distribution/Replication/Reliability
    - Fundamental algorithms - how well do they scale? Examples:
      - Dynamo (DHT)
        paper doesn't say (probably 1000's, though each app has own Dynamo instance)
        DHT's generally scale well and deal well with churn
        i.e. popular with P2P apps in general
        uses consistent hashing
        various schemes tested
        focus is on availability and latency guarantees (milliseconds)
      - BigTable (B+ like tree atop bunch of other distributed services)
        paper indicates generally < 500 servers in production
        diminishing returns due to GFS large block sizes
        interestingly, BigTable can run in all-RAM mode and is not decentralised
      - What we choose depends largely on data model
        e.g. DHT good if we don't want range queries/care about key locality, else a tree may be better
Must have good system visibility/insight
- For health/system status/load/etc (managerial)

- For development/debugging/performance optimisation/etc (for us. Want an X-Trace/DTrade for RAMCloud, perhaps?)

Scaling Down

The system should scale down as well as up:

...

Versions Compared

Old Version 15

New Version 16

Key

Scaling Down