Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Some application assumptions feeding into this discussion:

  • most data are written once
  • read dominated workload (writes may be less than 10% or even less than 1%)
  • while there is little locality, most requests refer to recent data (useful for power management

---------------------------------------------------------

Technology review

The following table compares the key characteristics of storage technologies. When possible, the values below focus on the raw potential of the underlying technology (ignoring packaging or other system issues). In other cases, I use representative numbers from current systems. 

 

Capacity

BW

Read Latency

Write Latency

Cost

Power Consumption

Server Disk

1TB

100MB/s

5ms

5ms

$0.2/GB

10W idle - 15W active

Low-end Disk

200GB

50MB/s

5ms

5ms

$0.2/GB

1W idle - 3W active

Flash (NAND)

128GB

100MB/s
600MB/s

20us

200us
2ms erase

$2/GB

1W active

DRAM (DIMM)

4GB

10GB/s

60ns

60ns

$30/GB

2W - 4W

PCM

50x DRAM?

DRAM-like?

100ns

200ns

disk-like?

1W idle - 4W active?

Caption: capacity refers to a system (not chip), BW refers to max sequential bandwidth (or channel BW), read and write latency assume random access (no locality).

Other interesting facts to keep in mind:

  • Flash can be read/written at ~2KB pages, erased at ~256KB pages
  • Durability: 10^6 cycles for Flash,10^8 for PCM
  • Disks take seconds to get out of standby while Flash and PCM take usec.
  • Flash can be a USB, SATA, PCIe, ONFI, or DIMM device. No clear winner yet.
  • FTL (Flash Translation Layer) must be customized to match access patterns. This can be a 2-5x difference in access latency.

---------------------------------------------------------

System-level Comparison (from the FAWN HotOS'09 paper)

What does this mean at the system level? The FAWN HotOS paper tries to quantify both performance and total cost of ownership (TCO) for servers using DRAM, disks, and Flash when servicing Seek (memcached) and Scan (hadoop) queries. The also consider two processor scenarios: traditional CPUs (Xeon) and wimpy CPUs (Geode). The following graphs refer to Seek queries:

Some caveats about the FAWN graphs:

  • No consideration to query latency!
  • I think that the choice of components is often suboptimal or they underestimate the capabilities of some components (Flash & traditional CPUs)
  • Cost of SSDs will likely drop a lot in the near future.
  • No consideration for the network.
  • It's not clear that their design points are balanced in any way.

---------------------------------------------------------

Flash for RAMCloud

Two options

  1. Use Flash as a disk replacement for durability, servicing requests from DRAM (Flash-backup)
  2. Use Flash to replace DRAM as the main storage technology (Flash-cloud)

Flash-backup (1) can help reduce some of the problem of the log-based persistence protocol discussed earlier in the quarter due to the better bandwidth characteristics Flash, in particular for the recovery case. Nevertheless, due to the increased cost of Flash compared to disks, if we can make the protocol work with disks using Flash is a bad idea. 

Flash-cloud (2) is interesting for two reasons. First, it will reduce significantly the cost/bit for overall system. Second, it will reduce significantly the power consumption for the system, especially for the idle case. A large part of our memory devices (DRAM or Flash) will be storing bits that are not accessed frequently. Minimizing the static power of these devices would be very useful for cost and scaling reasons.

The disadvantages for Flash-cloud are the following: 1) Unacceptale RPC latency in the tens of usec (more on this later). 2) While Flash chips are commodity, Flash systems are still not. The cost of systems may be high for a while due to volume. 3) While the movement towards Flash in the enterprise is strong and will address some of the endurance and bandwidth issues for Flash, it may not be exactly what we want as they will focus competing with disk as opposed to DRAM.

Architecting Flash-cloud

Within the node, we'd need a highly parallel storage system with multiple Flash  channels for bandwidth and latency purposes. PCIe is 1GB/s per lane, ONMI is 200MB/s per channel. We would have to make the FTL match our data model and access patterns.

  •  
  • Why not use flash instead of RAM as the main storage mechanism
    • What is flash latency?
      • SSDs:
        • Current claims for x25-E: 75 µs read, 85 µs write
      • Typical ONFI:
        • Micron MT29H8G08ACAH1, 8, 16, 32 Gb
          • Read 30 µs, Write 160 µs, Erase 3 ms
    • Can it be made a low enough so it doesn't impact RPC latency?
    • What is the latency of typical flash packaging today? 100 µs?
  • Does flash offer advantages over disk as the backup mechanism?
  • No labels