Overview Scribe

In-talk Discussion

Dean: Can we co-locate apps on RAMCloud storage machines?
Reduces network usage for apps with high locality.

Want to keep CPUs busy, if possible.

Anyone looking at more RAM per core? Unclear if this is the right
approach because of recovery performance.

NEC: Question re bisection bandwidth

Keith Adams: Have scaled throughput. Adding machines can increase number of data access possible.

Dean: Can possibly do well with locality.

Armbrust: But are there really 150 deep serial deps in data access for
Facebook?

Keith Adams: Yes. Plus parallel accesses in there as well.

Franklin: Skeptical about low-locality apps in the future.

Keith Adams: Think of disks in $/IO at FB, interesting to see that # for RC.

Franklin: Why not flash?

Mogul: Talked to airline data people. Have huge log files that are
processed seq. Only looking at the part we care about?

Keith Adams: Map reduce clearly still has a place here

Franklin: Zynga getting 4 TB of click stream per day

Aguilera: Really going to keep everyone's mail in RAM?
Dean: Just pull into RC from disk

Franklin: Shel's boss (RAM DB guy) argues largest companies data will
fit in RAM.

John: 10% to 90% pctile dataset sizes?
Dean: 10% ones small, can specify per column which are stored in
memory, some several petabytes.

Aguilera: Why not RAM just attached to network?

If op involves alot of data; small result; most efficient to move
computation to data
Dean: thiner client makes easier to support more languages

Franklin: Why does there need to be one answer for the data model?
Someone's going to eventually want to you open up the lower level
details if your assumptions are wrong, so you want to do that up
front.

More important to go down from k-v store: i.e. may want lower level
access to the memory in the servers.

Shel: Assumption that you can partition apps cleanly? i.e. no cross
workspace access. Simple security model not sufficient for sharing.

Google: Concerned about security. Can every email user have their own
workspace?
Separate R/W perms per table fix it? Google still wants at least a
table per user?

Use a two-part key auth on one part?

Netapp worried about opacity and indexing.

Franklin: Why not FS metaphor?

Aguilera: Server-side ops can help latency, but a slippery slope -
what's RC position?

Keith Adams: Thoughts on graph style operations?

Dean: Reasoning about ids having locality? Might want a hint that
some ids are on same server

Map reduce is going to want to fetch large sequential chunks

Multiget?

Dean: Delete whole table at once?

Thinking about how to diagnose failures and performance problems?

Mogul: Need good accounting if multitenant.

End-talk Discussion

Keith Adams: we denormalize to deal with latency; would have smaller objects if
could afford to follow pointers

10000 clients not sufficient