...
I'm optimizing a system in which synchronous disk writes are a key factor in performance. My current drives are too slow. I have a single Intel 520 drive that performs well enough (<1ms), but I need five drives. I bought five Intel 530 SSDs in hopes they would perform like the Intel 520, but instead they take 10ms to write a single byte to disk under various versions of Linux. Curiously, if I connect the Intel 530 drives over a USB-to-SATA adapter instead of using SATA directly, they're much faster (~200us per write). What's wrong?
Microbenchmark
Here's my microbenchmark: the microbenchmark, which calls write() and fdatasync() on a single byte of data 1000 times: https://gist.github.com/ongardie/9177853
I run this as "time ./bench" and divide the wall time by 1000 to get the approximate average time per write.
I normally run this on ext4. (I've tried it on a raw device as well; see negative results section below).
...
These are the disks I've tried and their performance:
model | qty | performance |
---|---|---|
Crucial M4 | 160 | 3.7ms per write on rc66 |
Intel 520 (SSDSC2CW120A3) | 2 | <1ms per write on rc66, rcmonster (440us) |
Intel X25-M (SSDSA2M120G2GC) | 1 | <1ms 210us per write on flygecko |
Intel 530 (SSDSC2BW120A4) | 5 | 10ms per write on rc66 (9.8ms), rcmonster (10.1ms), and flygecko (9.7ms) |
Intel 530 attached over USB-to-SATA adapter | 1 | <1ms per write on rc66, rcmonster, flygecko, and x1 |
SanDisk X100 (SD5SG2128G1052E) | 1 | 830us per write on x1 |
Cheap USB thumb drive | n |
...
7.6ms per write on x1 |
My goal is to get the Intel 530 drives to run fast on rc66 and similar machines.
Negative Results
- The first thing I did was update from the DC22 firmware to the current DC33 firmware. No effect.
- I tried a machine with 6gbps SATA (rcmonster). No effect.
- I tried a machine with a newer kernel (flygecko). No effect.
- I tried disabling APM power saving with hdparm -B. No effect.
- Did I try different NCQ sizes? Doesn't work on some machines.
- I tried changing the I/O scheduler. This shouldn't have an effect since there's only one I/O outstanding at a time. No effect.
- I tried running the benchmark on the raw block device rather than an ext4 partition. This helped but only slightly, reducing latency per write from 10ms to about 9ms.
...