Applications

Fundamental advantages of RAMCloud for an application

  • Any data access other than local RAM will be faster with RAMCloud (even ~75-85us with SSD)
  • Enables very high rates of queries, at least an order of magnitude faster than MySQL per box
    • People have reported O(20,000) for MySQL, probably really basic cached queries
    • More complex will likely be significantly slower, and if anything has to come from disk your of course hosed
  • As a consequence of the above, app writers can feel free to write dependent sequential queries if needed and it will still be fast
    • A con to this is to get similar functionality to today you may actually have to write sequential dependent queries
  • _Potentially_ no need to name every one of your queries and deal with synchronization when interacting with a cache layer like Memcached
  • No performance dependency on locality
  • Persistence when compared to memcached
  • Easy scalability
  • Less complexity than MySQL + Memcached

Lets consider some application categories that run in a datacenter/cloud setting:

  • Synthesizing Hardware (Cisco, Nvidia)
    • Cpu bound?
    • Not much disk access?
    • Could it page memory to ramcloud?
  • Rendering (Pixar, ILM, Disney)
    • Cpu bound?
    • Dataset may be too large
    • Possible speedup for grabbing textures?
    • Would this help a client machine manipulating the scene?
  • Simulation (weather, nuclear)
    • Cpu bound?
    • Any need for shared storage?
  • Transactional (stock exchange, banks, credit card processing)
    • Must be a 'to disk' component here including sync, could be a win
    • Online component here, ie fraud detection
  • MapReduce, batch processing
    • Could be interesting, depending on the dataset size
    • Is this an existing Pain Point?
    • Could allow the use of 'online' data
  • Web related
    • Content Delivery (CDN)
    • Pages requiring many low-locality queries returning small (define) sized data (Facebook, Myspace, Google, Yahoo, Ebay, etc)
    • Pages requiring many high-locality (or small dataset) queries returning small (define) sized data (CNN, Slashdot)
    • Pages consisting primarily of static content (Microsoft, IMDB, etc)
  • Raw Storage

Of the top 20 websites, these could likely largely benefit from RAMCloud

  • Google
  • Yahoo
  • YouTube (not the video part)
  • Facebook
  • Windows Live
  • MSN (how much is static and just cached?)
  • Wikipedia (maybe?)
  • Blogger.com
  • Myspace
  • Baidu (Chinese search engine)
  • Yahoo Japan
  • Google India
  • Google Germany
  • Google France
  • Google UK
  • WordPress.com

Concensus built around the following applications:

  • Pages requiring many low-locality queries returning small (define) sized data (Facebook, Myspace, Google, Yahoo, Ebay, etc)
  • Something in the highly transactional space (visa,paypal,etc) possibly involving live data, ala fraud detection
  • MapReduce, where you could run on live data, and/or not need to worry about shuffling data around inbetween machines to speed it up
  • Raw storage, as suggested by John