Date: Fri, 29 Mar 2024 00:52:42 +0000 (UTC) Message-ID: <1858666422.5.1711673562573@c5a92594b45d> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_4_45309021.1711673562572" ------=_Part_4_45309021.1711673562572 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The main takeaway from the talk seemed to be that RAMCloud should be int= o two separate layers:
It was even suggested that RAMCloud should just expose the storage layer= to a traditional RDBMS application which would just use this in the place = of a disk.
A number of interesting points were mentioned:
E-mail follow-up from Paul Heymann (heymann@stanford.edu):
I liked your talk in InfoLunch, but had a few uninformed comments that m= ight be useful:
(1) Is "RAMCloud" what Google and other companies that are really on top= of engineering already do? For example, Jeff Dean just gave an excellent t= alk at WSDM2009 which detailed how their architecture works: http://videolectures.net/wsdm09_dean_cblirs/ with details like= the fact that they moved their entire index into main memory a few years a= go (requiring a search to hit thousands of machines, but shrinking latency = massively). Similarly, I recall hearing a few years ago that SLAC was worki= ng on machines with a terabyte of main memory (the best link I can find for= that is htt= p://www.violin-memory.com/listing_detail.php?listing=3D62&id=3DPress_Re= leases).
(2) Have you looked at H-Store? There are more details here: htt=
p://db.cs.yale.edu/hstore/ . While there is almost certainly older work=
in main memory databases that explored the space, the H-Store project is/w=
as looking at the results of almost all of the assumptions you were proposi=
ng within the last several years. A quote: "For
example, all but the very largest OLTP applications can fit in main memory=
of a modern shared-nothing cluster of server machines." They also drop cer=
tain locking assumptions and other things, and generally are rebuilding eve=
rything based on modern assumptions. (H-Store also seems like a big project=
---MIT, Yale, Brown...) The only major difference seems to be that they are=
defaulting to the relational model whereas you haven't chosen yours yet.=
p>
(3) I was unclear on what you thought the difference was between data th= at would go in the RAMCloud and the data that wouldn't. For example, you cl= aimed that Facebook could be completely in the RAMCloud, but said that cert= ain data like photos and video wouldn't go in.
(4) Related to (3), I was unclear on how the RAMCloud would differ from = a massive cache in what seemed to be your primary use case---web applicatio= ns. In other words, if you had enough RAM to put all of Facebook's MySQL da= ta into the RAMCloud, would it be any different performance wise than using= all of the RAM in a cache in front of the MySQL databases? (i.e., what's t= he difference if there are no cache misses?)
(5) It seemed pretty unclear to me how RAMCloud relates to Solid State D= isks and other forms of flash and flash-like memories. I'm far from an expe= rt in that area, but I would have liked to have seen an argument like: "Alt= hough SSDs will eliminate seeks, be accessible through PCI, and have storag= e capacity on par with hard disks in 5-10 years, we think DRAM will be bett= er for large data sets because _________."
(6) I didn't buy your statements about the size of a $4M amount of RAM i=
ncreasing to a size that was reasonable for data storage within 5-10 years.=
Or, at the very least, I'm not completely convinced. Every several years, =
I get a new laptop which has maybe 4 times the capacity of my previous lapt=
op, yet my laptop hard disk always remains full.
Similarly, companies seem to increase how much data they are gathering on =
their customers to the maximum extent that they can (e.g., Amazon monitorin=
g every aspect of customer behavior on their website). If you're not convin=
ced that RAMCloud would be useful now, I would need much more convincing th=
at the ratio of "data produced by a company" to "data that can fit in DRAM =
economically" will be any different in 5-10 years (e.g., is there a fundame=
ntal reason why companies' ability to gather data about people, their logis=
tics, and so on will hit a wall within the next 5-10 years?).
(7) It seemed like having a (clearer) motivating application would resol= ve many of the uncertainties you had when you were presenting. For example,= one thing that (I think) makes the relational data model somewhat rigid (o= ne of your complaints) is that it is difficult to change the schema. This i= sn't really a big issue if you're a bank, or probably even if you've got a = mature web product, but if you're a web company doing rapid prototyping, yo= u might need something where experimentation is easier. For example, Friend= Feed uses this (somewhat bizarre) half relational, half sort-of-semi-struct= ured data model to make rapid prototyping easier: http://bret.appspot.com/entry/how-friendfeed-uses-mysql . Is R= AMCloud for startups? For Facebook? For Google? For IBM? For Bank of Americ= a?
Anyway, I would be interested to see a second presentation to the InfoLu= nch, given how far we got in the first one...
Cheers,
Paul