Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

This page describes an unresolved (early 2014) performance issue that Diego is having with Intel 520 SSDs. Feedback and ideas are welcome: ongaro at cs stanford edu.

Summary

I'm optimizing a system in which synchronous disk writes are a key factor in performance. My current drives are too slow. I have a single Intel 520 drive that performs well enough (<1ms), but I need five drives. I bought five Intel 530 SSDs in hopes they would perform like the Intel 520, but instead they take 10ms to write a single byte to disk under various versions of Linux. Curiously, if I connect the Intel 530 drives over a USB-to-SATA adapter instead of using SATA directly, they're much faster (~200us per write). What's wrong?

Microbenchmark

Here's the microbenchmark, which calls write() and fdatasync() on a single byte of data 1000 times: https://gist.github.com/ongardie/9177853

I run this as "time ./bench" and divide the wall time by 1000 to get the approximate average time per write.

I normally run this on ext4. (I've tried it on a raw device as well; see negative results section below).

Machines

These are the machines I've tried. The Intel 530 over SATA takes 10ms per write on each of these, and other SSDs take <1ms per write on each of these.

nametypedescriptionmodeldistrokernelSATAUSBPurchased
rc66serverThis is the machine type I'd like to use (we have 80 of them).Colfax CX1180-X4 / Supermicro 5016TI-TFRHEL62.6.323gbps2.02011
rcmonsterserverThis is a newer, beefier machine with Intel 520 drives, but also stuck on RHEL6.Colfax CX1265i-X5RHEL62.6.326gbps3.02013
flygeckodesktopThis is a newer desktop machine with a newer version of Linux. Arch3.123gbps2.02013
x1laptopThis is a laptop with a newer version of Linux.Lenovo Thinkpad X1 CarbonDebian Jessie3.12mSATA 6gbps3.02012

Disks

These are the disks I've tried and their performance:

modelqtyperformance
Crucial M41603.7ms per write on rc66
Intel 520 (SSDSC2CW120A3)2<1ms per write on rc66, rcmonster (440us)
Intel X25-M (SSDSA2M120G2GC)1210us per write on flygecko
Intel 530 (SSDSC2BW120A4)510ms per write on rc66 (9.8ms), rcmonster (10.1ms), and flygecko (9.7ms)
Intel 530 attached over USB-to-SATA adapter1<1ms per write on rc66, rcmonster, flygecko, and x1

SanDisk X100 (SD5SG2128G1052E)

1830us per write on x1
Cheap USB thumb driven7.6ms per write on x1

My goal is to get the Intel 530 drives to run fast on rc66 and similar machines.

Negative Results

  • The first thing I did was update from the DC22 firmware to the current DC33 firmware. No effect.
  • I tried a machine with 6gbps SATA (rcmonster). No effect.
  • I tried a machine with a newer kernel (flygecko). No effect.
  • I tried disabling APM power saving with hdparm -B. No effect.
  • I wasn't able to change NCQ sizes with /sys/block/sdc/device, but it's stuck at 1 on flygecko and 31 on rcmonster.
  • I tried changing the I/O scheduler. This shouldn't have an effect since there's only one I/O outstanding at a time. No effect.
  • I tried running the benchmark on the raw block device rather than an ext4 partition. This helped but only slightly, reducing latency per write from 10ms to about 9ms.

Additional Questions

  • One of the differences between the Intel 520 and Intel 530 is that the 530 does more aggressive power saving. Is there a way to turn that off?
  • Are the software paths under Linux for the Intel 520 and the Intel 530 identical?
  • Would Intel be willing to trade my five 530s in exchange for five 520s or similarly performing drives?

blktrace measurements

I ran blktrace while running the benchmark on flygecko with the Intel X25-M and the Intel 530 SSDs. Here's the summary produced by btt:

Intel 530:

ALL         MIN         AVG         MAX     N
--- ----------- ----------- ----------- -----
Q2Q 0.000000943 0.010212746 0.736672437  1015
Q2G 0.000001300 0.000002340 0.000022269 16144
G2I 0.000001200 0.000002133 0.000020969 16144
Q2M 0.000000173 0.000000526 0.000000858 112
I2D 0.000006150 0.000068040 0.023831449 16144
M2D 0.000014449 0.000021722 0.000026137 112
D2C 0.000099355 0.002885890 0.160061067 1016
Q2C 0.000125742 0.002958056 0.160093994 1016

Intel X25-M:

ALL         MIN         AVG         MAX     N
--- ----------- ----------- ----------- -----
Q2Q 0.000001345 0.000214151 0.000508142  1015
Q2G 0.000000668 0.000001025 0.000005047 16080
G2I 0.000000720 0.000000966 0.000019082 16080
Q2M 0.000000237 0.000000374 0.000000968 176
I2D 0.000000320 0.000003889 0.000020007 16080
M2D 0.000015045 0.000021314 0.000028772 176
D2C 0.000078657 0.000084969 0.000281596 1016
Q2C 0.000083605 0.000091020 0.000317568 1016

I'm no blktrace expert, but I think the most interesting numbers are D2C (driver to completion) and Q2Q (queue to queue). It's safe to focus on just the average numbers (I looked at a plot of the D2C latencies and another of the per-write times in userspace, and both have little variation).

With the Intel 530, it looks like each write takes the driver and device 2.9ms to complete, but does not complete end-to-end for 10ms. With the Intel X25-M, each write takes the driver and device 85us to complete, and this completes end-to-end in 214us.

So that raises two questions: why does the Intel 530 take so much longer in the driver/device, and why does it take even longer end-to-end?

  • No labels