Applications
Fundamental advantages of RAMCloud for an application
- Any data access other than local RAM will be faster with RAMCloud (even ~75-85us with SSD)
- Enables very high rates of queries, at least an order of magnitude faster than MySQL per box
- People have reported O(20,000) for MySQL, probably really basic cached queries
- More complex will likely be significantly slower, and if anything has to come from disk your of course hosed
- As a consequence of the above, app writers can feel free to write dependent sequential queries if needed and it will still be fast
- A con to this is to get similar functionality to today you may actually have to write sequential dependent queries
- _Potentially_ no need to name every one of your queries and deal with synchronization when interacting with a cache layer like Memcached
- No performance dependency on locality
- Persistence when compared to memcached
- Easy scalability
- Less complexity than MySQL + Memcached
Lets consider some application categories that run in a datacenter/cloud setting:
- Synthesizing Hardware (Cisco, Nvidia)
- Cpu bound?
- Not much disk access?
- Could it page memory to ramcloud?
- Rendering (Pixar, ILM, Disney)
- Cpu bound?
- Dataset may be too large
- Possible speedup for grabbing textures?
- Would this help a client machine manipulating the scene?
- Simulation (weather, nuclear)
- Cpu bound?
- Any need for shared storage?
- Transactional (stock exchange, banks, credit card processing)
- Must be a 'to disk' component here including sync, could be a win
- Online component here, ie fraud detection
- MapReduce, batch processing
- Could be interesting, depending on the dataset size
- Is this an existing Pain Point?
- Could allow the use of 'online' data
- Web related
- Content Delivery (CDN)
- Pages requiring many low-locality queries returning small (define) sized data (Facebook, Myspace, Google, Yahoo, Ebay, etc)
- Pages requiring many high-locality (or small dataset) queries returning small (define) sized data (CNN, Slashdot)
- Pages consisting primarily of static content (Microsoft, IMDB, etc)
- Raw Storage
Of the top 20 websites, these could likely largely benefit from RAMCloud
- Yahoo
- YouTube (not the video part)
- Windows Live
- MSN (how much is static and just cached?)
- Wikipedia (maybe?)
- Blogger.com
- Myspace
- Baidu (Chinese search engine)
- Yahoo Japan
- Google India
- Google Germany
- Google France
- Google UK
- WordPress.com
Concensus built around the following applications:
- Pages requiring many low-locality queries returning small (define) sized data (Facebook, Myspace, Google, Yahoo, Ebay, etc)
- Something in the highly transactional space (visa,paypal,etc) possibly involving live data, ala fraud detection
- MapReduce, where you could run on live data, and/or not need to worry about shuffling data around inbetween machines to speed it up
- Raw storage, as suggested by John