Page Comparison

...

Within the node, we'd need a highly parallel storage system with multiple Flash channels for bandwidth and latency purposes. PCIe is 1GB/s per lane, ONMI is 200MB/s per channel. We would will have to make the FTL match our data model and access patterns. There is significant prior work here, so we'll probably need adjust one of the known protocols to our requirements. We can have some of the cores in the processor chip run our FTL protocol directly, bypassing most of the logic in a separate Flash controller (see UCSD Gordon project).

...

What is flash latency?
- SSDs:
  - Current claims for x25-E: 75 µs read, 85 µs write
- Typical ONFI:
  - Micron MT29H8G08ACAH1, 8, 16, 32 Gb
    - Read 30 µs, Write 160 µs, Erase 3 ms
Can it be made a low enough so it doesn't impact RPC latency?
What is the latency of typical flash packaging today? 100 µs?

...

This would allow us to do FTL over multiple channels, which can help with write bandwidth. Erasing pages would be done in the background of course. Since the workload is read dominated and data is write-once for the most part, there should not be that much erase activity.

At the system leve, we can now afford to replicate data across the data center. As long as the granularity is a multiple of the Flash page, the replication scheme is orthogonal.

RPC latency: From our earlier discussion, our realistic target for RPC is currently at 5usec to 20usec, depending on the choice of HW and SW. Flash would increase the read RPC by at least 20usec. The key question is whether this is acceptable for our application. Note, it will not affect the throughput goals assuming a multi-channel Flash system.

Eliminating the latency impact of erase events (2ms latency): A read may go to the same bank as an on-going erase event. While this will be rare, we need a way to reduce it's impact: if a node receives an RPC to a Flash bank that has an ongoing erase event, it can immediately forward the request to the replica node in the system. This will increase the latency by ~5us for sure. Since the probability that both nodes are erasing in the corresponding banks is extremely low, we should never see that higher latency.

Data persistence: Flash is non-volatile. However, we cannot make writes synchronous. To avoid this, we can exploit replication. On a block write, we wait for both nodes to write the data to DRAM. Then we do the actual Flash write in the background. Assuming sufficient control over the FTL algorithm and given the multiple channels, we can have very high bandwidth. There is no need for a log protocol anymore with any of the related issues. Persistence across data centers is rather orthogonal to Flash.

Research approach: Assuming the RPC latency increase is acceptable, how would we do research for Flash-Cloud? My suggestion is to use DRAM based systems to avoid dependences to the performance characteristics, avaiability, and cost of Flash systems over the next few years. We can write a software layer that emulates the characteristics of Flash on top of DRAM in order to see performance issues. Nevertheless, we will still avoid implementing any complex log protocol for persistenc. We can also avoid the emulation layer altogether by optimistically assuming that PCM will happen by the time we finish the project.

Versions Compared

Old Version 14

New Version 15

Key