This page describes an unresolved (early 2014) performance issue that Diego is having with Intel 520 SSDs. Feedback and ideas are welcome: ongaro at cs stanford edu.
Update Tue, 25 Feb 2014 00:21:06 -0800: added "Effects of Write-Caching" section.
Update Tue, 25 Feb 2014 13:12:01 -0800: added "Larger Writes" section.
Update Sat, 01 Mar 2014 19:31:34 -0800: updated "Larger Writes" section with big graph, updated microbenchmark section to link to improved benchmark code.
Contents:
Table of Contents | ||
---|---|---|
|
Summary
I'm optimizing a system in which synchronous disk writes are a key factor in performance. My current drives are too slow. I have a single Intel 520 drive that performs well enough (<1ms), but I need five drives. I bought five Intel 530 SSDs in hopes they would perform like the Intel 520, but instead they take 10ms to write a single byte to disk under various versions of Linux. Curiously, if I connect the Intel 530 drives over a USB-to-SATA adapter instead of using SATA directly, they're much faster (~200us per write). If I turn off write-caching, on some machines they take 1.6ms per write. What's wrong?
Microbenchmark
Here's the original microbenchmark, which calls write() and fdatasync() on a single byte of data 1000 times: https://gist.github.com/ongardie/9177853 I run this as "time ./bench" and divide the wall time by 1000 to get the approximate average time per write.
A much improved and extended version of the benchmark is at https://github.com/ongardie/diskbenchmark .
I normally run this on ext4. (I've tried it on a raw device as well; see negative results section below).
...
These are the disks I've tried and their performance:
model | qty | capacitor | performance |
---|---|---|---|
Crucial M4 | 160 | 3.7ms per write on rc66 | |
Intel 320 (SSDSC2CW120A3) | 1 | yes | ~200us per write on rcmonster |
Intel 520 (SSDSC2CW120A3) | 2 | no | <1ms per write on rc66, rcmonster (440us) |
Intel X25-M (SSDSA2M120G2GC) | 1 | 210us per write on flygecko | |
Intel 530 (SSDSC2BW120A4) | 5 | no | 10ms per write on rc66 (9.8ms), rcmonster (10.1ms), and flygecko (9.7ms) |
Intel 530 attached over USB-to-SATA adapter | 1 | no | <1ms per write on rc66, rcmonster, flygecko, and x1 |
SanDisk X100 (SD5SG2128G1052E) | 1 | 830us per write on x1 | |
Cheap USB thumb drive | n | 7.6ms per write on x1 |
My goal is to get the Intel 530 drives to run fast on rc66 and similar machines.
...
- The first thing I did was update from the DC22 firmware to the current DC33 firmware. No effect.
- I tried a machine with 6gbps SATA (rcmonster). No effect.
- I tried a machine with a newer kernel (flygecko). No effect.
- I tried disabling APM power saving with hdparm -B. No effect.
- I wasn't able to change NCQ sizes with /sys/block/sdc/device, but it's stuck at 1 on flygecko and 31 on rcmonster.
- I tried changing the I/O scheduler. This shouldn't have an effect since there's only one I/O outstanding at a time. No effect.
- I tried running the benchmark on the raw block device rather than an ext4 partition. This helped but only slightly, reducing latency per write from 10ms to about 9ms.
- I tried doing bigger writes to see if they would be faster. See "Larger Writes" section below. Didn't help.
Additional Questions
- One of the differences between the Intel 520 and Intel 530 is that the 530 does more aggressive power saving. Is there a way to turn that off?
- Are the software paths under Linux for the Intel 520 and the Intel 530 identical?
- Would Intel be willing to trade my five 530s in exchange for five 520s or similarly performing drives?
...
So that raises two questions: why does the Intel 530 take so much longer in the driver/device, and why does it take even longer end-to-end?
Effects of Write-Caching
Tom Lyon suggested I try to enable write-caching (https://twitter.com/aka_pugs/status/438148846476996608). Write-caching was already enabled on all the machines I tried, but disabling it had some interesting effects. I used hdparm -W 0 /dev/sdx to toggle write-caching.
host | disk | write caching off | write caching on |
---|---|---|---|
rcmonster | 520 | 100us per write | 440us per write |
rcmonster | 530 | 10.4ms per write | 10.1ms per write |
rc66 | M4 | 1.5ms per write | 2.7ms per write |
rc66 | 530 | 1.7ms per write | 9.8ms per write |
flygecko | X25-M | 1.1ms per write | 210us per write |
flygecko | 530 | 1.5ms per write | 9.5ms per write |
Each drive seems to behave differently. For the 530, write-caching seems to have no effect on rcmonster but improves latency for one-byte writes by a factor of six on rc66 and flygecko. This opens up a lot of questions... See also Larger Writes section below.
Larger Writes
Many people have suggested that writing 1 byte is not efficient and writing more bytes should be faster. I tried it both with write-caching on and off.
The writes were done at 0 and 512 byte offsets into the file, and the whole experiment was repeated 5 times. The best time is shown for each point. Each data point represents only a small number of writes, though, so the error may be high. Also, the "x1" machine is my laptop, so it wasn't always idle.