SEDCL Retreat 2012 - Industrial Feedback Session

Naoki Shiota, NEC Japan

Thanks
ramcloud approach to datacenters: interesting
I am new to datacenter technology
My research group is discussing how to construct future ITS (Intelligent Transportation Systems) - technology being supported by ramcloud could be interesting to discuss
Will take this information (things i heard about here) back and discuss with my group.

Sanjeev Datla - Emulex

Was here last time
This time significantly more people on RAMCloud compared to last time.
Our standpoint - want to look at DCTCP protocols - being worked on in Balaji's group.
Want to see how to fund underlying infrastructure on what you're on working here.
How to architect stuff to address some of the probes you're trying to address with the software stuff
Bill's talk - very useful
Heard a lot about a lot of very practical stuff - customers also talk about these issues - e.g., low latency
Interested in seeing use of building blocks from lower things to general systems

Sachin Kulkarni - Facebook

Retreat: loved it, learnt a lot, interesting
ramcloud:
would like to see multi tier architecture on core stuff - eg., as in databases
instead of relying completely on software, how we can get hardware to do some work
richer datamodel to key val store - expressive queries - richer interface with data
because we'll be replaces db with ramcloud
mincopyset - interesting. want to use with haystack. if other people in industry use - let me know
bills talk - loved it.
rdma - good insight
berk - already knew about it

Venkat Venkataramani - Facebook

enjoyed; got to meet smart people
want more people (students) from here to come back to fb and talk to more engineers about work you're doing
couple ideas:
at FB, there's a growing use of state mgmt of transient data (e.g., how many people are looking at the photo i am looking at right now)
keeping track of this - next to impossible
we built in-memory datastore to serve this
there's a growing category of apps - ramcloud can not only replace db, but also help new use cases which are very write heavy
Backups + compaction: Can you do an offline backup at the same time as doing disk based log cleaning?
key val stores are great. but some facilities to provide to users to manage their own schema
wants features for schema migration. a problem they have is that applications have to update schemas on the fly. one idea is to register transformers to the server so that code doesn't have to live in the application.
not (just) coming with data model, but provide primitives where people can come up with data model.
we're hiring - a lot

Shinji Nakadai, NEC

we're doing research related to metadata for distributed systems like bigtable
in such systems coordinator generally becomes bottleneck
i believe that ramcloud coordinator can probably handle the requests, but would like to see the result
would like ramcloud to support the trigger and stored procedures
afaik, distributed key val stores used largely by financial industry
diff from memcached, a key val store that handles replicates, supports trigger
in international industry, sys that reduces 1 ms makes money
they want key val store, and trigger

Ali Ayoub, Mellanox

Thanks
interesting to see progress since last year, esp in ramcloud
looking forward to ramcloud 1.0 - looking forward to do experiments
any project that has a kernel module - have more discussion in linux kernel community - they can contribute
ramcloud wiki page - very helpful, help learn a lot. things are added as they're happening - helpful. Continue keeping it updated
rdma - nice to see how it can improve performance a lot - this is technology that's not mainstream (and its great to see that happen) - changes baseline
in general, seeing techniques you try and their analysis - was a good exposure to learn about your project, and for our own future projects
when latency is very imp - lot of attention is paid to overheads. any component that requires lot of sw - hw can help - when it is possible that hw can offload, go and show the vendor how hw can come into picture
i hope i can join next year to see updates some projects from last year, like r2d2, wanted to see update, paper etc
would like to see status update at the retreat next year

Deepak Kenchammana - NetApp

Part that was most satisfying and want to see even more - experimentation - e.g., loved talks on rdma, and data from fb - even if its still raw
systems that are driven by analysis of real data - are neat. we build lot of systems from intuition. so its great to see this (systems being driven by analysis of data) happening. keep this up.
ramcloud: finally is going to include backend storage that is much slower - separation between access latencies upstream and cold, slow storage - how do you take care of these tiers - how do you warm up these caches, and working sets - how are you going to use high capacity, slow storage at the backend, and still provide high performance to apps
Basically, think beyond RAM - datasets are going to be much larger than would fit in RAM
dctcp: really want to see bsd implementation
data center scheduling problem (Chritisina): think about: caching is becoming imp for app perf - scheduling becomes very constrained - can do you do something about moving things to places where there is common data that's already in cache - that will help improve data center scheduling.
logistics: great place, very well organized, perfect time, right distance for bay area people

Shel Finkelstein, SAP
Had fun, enjoyed discussions, location, game-playing.
Diego's talk: A lot of fear and loathing of Paxos. Wants to see an open-source version of Paxos that performs really well. Interesting work.
Alex's talk: Spoke to Alex offline, thinks use of RDMA is good.
Christina's talk: wants to see the QoS scheduling work go from simulation to measurement
Bill's talk was really exciting
Asaf's talk: interested in economic impact, mean time to repair issues, techniques for keeping the business running with intermittent failure vs 3TB loss.
Berk's talk: need to understand ideas better
Steve's talk: a little hard to understand memory utilization, would be interesting to see how larger memory sizes affect this
Elliot's talk: what about range queries? Would this be possible? Locality issues: how quickly can I enumerate a range?

Satya Nishtala, Cisco
First time at a retreat. Came because he was interested in RAMCloud.
Problems discussed: how do you access key-value pairs spread across the datacenter, what are the problems associated with it, and what are the solutions.
Wants to see more of this in the future.
Fundamentally a product guy: wants to see how this gets applied into applications and integrated into systems in the future.

Abdul Kabbani, Google
Definitely learned a lot. Sees applicability in especially Christina, Asaf's talks in platforms at Google.
Priority queuing scheme and dropping talk was interesting.
Wants to see more about DCTCP.
The RAMCloud talks weren't really his area, but he learned a lot from them too.
DCTCP update: didn't fly in IETF because it's limited to datacenter. Need more lobbying in the IETF community. That would be the first step to have it happen. People at Stanford are still fixing bugs and verifying the implementation. Working with Scott Vimal to get DCTCP into Linux, not doing this on LKML yet.

Nathan Schrenk, Facebook
There wasn't much about DCTCP at this event, but he'd like to hear about why DCTCP hasn't gotten it accepted into the Linux kernel. Could provide wider adoption of that work.
Really enjoyed Bill's talk. Wants to know when he'll start a company.
The copysets talk was interesting. He wants a copy of the paper to share with people at Facebook.

Rajesh Nishtala, Facebook
Often see all-to-all communication pattern in a social network. Start getting into full bisection bandwidth and related concepts.
Would really encourage the kind of research about how to fetch data from a wide variety of sources.
Example: If I'm fetching my news feed within a front-end cluster, the contents of the stories throughout that newsfeed will be scattered throughout the entire cluster.
Encourages us to adopt the "move fast and break things" philosophy. Get into this model of rapid prototypes and see what sticks.
They've seen a lot of big stuff come out of this approach, trying out random things and seeing what sticks.

Rainer Brendle, SAP
This retreat succeeded at making him think.
We have low latency and they've become available. What kind of impact does it have? What is driving that forward? How can I get my hand on it? How can I use it?
The best thing about bad, high latency is that I can cure[?] it by adding more layers [caching?].
Removing layers is a hard game. It involves a lot of people.
But it's an issue we will have to do, because physics drives this.
How does this happen? How does the industry build a low latency network become obsolete.
Everything becomes thinner or goes away. A lot of things change.

Shankar Pasupathy, NetApp
The value for people like us is that this kind of event makes you think.
It would have been great to have a 1 hour technical overview about RAMCloud. It's been two years for him.
Wants to have the poster session on the first day so that you gets to meet the students and mull over the ideas.
Preview of hard problems to solve at the end.
A demo might have been nice.
Great location, right amount of time.
Wants to have the slides made available to share with people in his group.
Asaf's talk was fascinating. Is three copies the right number? What's the right number for RAMCloud?
What does this super-fast recovery enable you to do differently from a disk space system?
Wants to hear our thoughts about an application.
Wants to use RAMCloud as a giant redirector. In storage, names and locations are tied together. For [how many?] objects, needs 32TB to do these lookups.
Look at Ceph for pseudo-random placement to not have to go to the coordinator.
How do you understand that you've solved all the corner cases, and at scale?
What would you do with a lot of spare cores? Compress the RAM? Inline deduplicate?
Didn't understand if we do checksums or periodic scrubbing on the data on backups and/or in memory.
Believes persistent RAM will become real in a decade, with the same characteristics as DRAM.
Calculate if the flash we're using will even last three years.
When you take server design for RAMCloud, do you look at 5 or 10 years down the line, when this will possibly become a product (in terms of trends in memory)?

Ed Bugnion, Stanford
The strength is these are different projects, faculty, interests operating at different levels.
Fascinated by the synergies, how the different projects motivate each other.
It'd be good if some of the industry members want to give a talk, could have one or two talks from industry members to share their perspective and speak to the students.

Someone suggested creating a forum (or facebook group) for ongoing discussions.