Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add memory hierarchy research topic

...

  • How should tablets be assigned to servers when they are created?
  • What usage information needs to be collected in order to understand what a particular server is overloaded?
  • Was the right way to reconfigure the system (e.g., splitting and/or moving tablets)?
  • Does locality matter? E.g., or some tablets that should be on the same server?
  • Recovery tends to scatter tablets that used to be co-located; do they need to be collected together again?
  • How do the configuration decisions affect performance?

Memory Hierarchy

Storing all data in memory is expensive, but storing some on flash and disk raises a ton of questions.

  • Is it possible to serve data from disk without impacting the performance of data served from memory? I.e., how much hardware needs to be dedicated to the disk-based objects (disks? CPUs?)
  • What's the latency you can expect out of disk/flash when the system is under load? How quickly do applications lose all benefit of having any objects in memory?
  • Should metadata for all objects be kept in memory, even the disk-based ones? In that case, how big does an object have to be before there's any substantial savings from keeping it on disk?
  • How does this compare to a volatile cache in front of a disk-based system?
  • To what degree do you want both of these things in one system vs two independent systems? Perhaps with a client library that can interact with both.
  • How would this change if the latency or IOPS of flash were different? (I assume disk will remain about the same over time.)
  • Should the system manage which objects go where, the application, or some mixture?
  • What are the characteristics of my application that would lead me to decide on a disk-based system vs a memory-based system vs a hybrid?