Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Must manage growth automatically
    • ideally, just plug in more servers and the system automatically remodels itself to handle the additional capacity
      • at large scale, individual management is impossible => need LOM
      • at small scale, management probably feasible => assuming additional complexity for small setups is ok
    • primarily, must automate network autoconfiguration (easy)
      • what about autoconfiguring location?
        • system will probably want to know what lives where (which machines share same rack, etc, for availability)
        • probably can be derived from subnet?
    • Scale affects:
      • Network architecture
      • Distribution/Replication/Reliability
        • Fundamental algorithms - how well do they scale? Examples:
          • Dynamo (DHT)
            • paper doesn't say (probably 1000's, though each app has own Dynamo instance)
              • DHT's generally scale well and deal well with churn
                • i.e. popular with P2P apps in general
              • uses consistent hashing
                • various schemes tested
            • focus is on availability and latency guarantees (milliseconds)
          • BigTable (B+ like tree atop bunch of other distributed services)
            • paper indicates generally < 500 servers in production
            • diminishing returns due to GFS large block sizes
            • interestingly, BigTable can run in all-RAM mode and is not decentralised
          • What we choose depends largely on data model
            • e.g. DHT good if we don't want range queries/care about key locality, else a tree may be better
  • Must have good system visibility/insight
    • For health/system status/load/etc (managerial)
    • For development/debugging/performance optimisation/etc (for us. Want an X-Trace/DTrade for RAMCloud, perhaps?)

Scaling Down

The system should scale down as well as up:

...