Measuring Disk IO Performance on MacOS
Over time and numerous hardware updates around the office, I collected a vast number of 2.5″ HDD’s in my “hardware junk” box. The other day, I noticed two Kingston SSDNow V200 128GB SSD’s just sitting there doing nothing, so I decided to make them usable again. I have a really BAD track record of broken non-ssd 2.5″ travelling external disks. 99% of them broke or started showing serious problems just after 1st year of usage (traveling with them with the notebook). I wanted to see how will SSD disk act in same conditions.
I visited my local hardware store to get USB3 2.5″ HDD enclosure, being geek, I did my homework and decided to get noname enclosure for 15 EUR with semi rubber protection.
Good lady at the counter suggested that instead of 15EUR one, I get 13EUR noname enclosure since “it was better”.
Sceptical that I am, I bought both and decided to do a test and prove her that she is wrong. The one with higher price had to be better. :)
After fitting disks in enclosures, first issue I stumbled upon was a lack of disk benchmarking tool on MacOS. On Windows I used hdtune for ages and was happy with it. On MacOS however, Blackmagic Disk Speed Test in Mac App Store did not inspire confidence in me (blac kmagic, cmon?), not did 11yrs old Xbench or jDiskMark beta (written in Java).
In Ubuntu/Debian/RHEL land I’ve benchmarked device IO before and had good experience with FIO. FIO is a popular tool for measuring IOPS on a Linux servers.
Do not make mistake of benchmarking (or using dd for eg.) /dev/disk device.
On MacOS you should always use /dev/rdisk device.
/dev/disk – buffered access, for kernel filesystem calls, broken in 4kb chunks. goes more expensive root.
/dev/rdisk – “raw” in the BSD sense and force block-aligned I/O. Those devices are closer to the physical disk than the buffered cache ones.
If you do a read or write larger than one sector to /dev/rdisk, that request will be passed straight through. The lower layers may break it up (eg., USB breaks it up into 128KB pieces due to the maximum payload size in the USB protocol), but you generally can get bigger and more efficient I/Os. When streaming, like via dd, 128KB to 1MB are pretty good sizes to get near-optimal performance on current non-RAID hardware. (source)
1. Install FIO
brew install fio
2. Check correct disk number
diskutil list
Everything from this step forward can and will delete data on your disk. So BE VERY CAREFUL on which disk you use. You have been warned.
3. Precondition SSD
We precondition each drive the same way for each measurement, and stimulate the drive to the same performance state so the test process is deterministic
sudo dd if=/dev/zero of=/dev/rdisk2 bs=1m
4. Running tests
Random read/write performance
./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
Random read performance
./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread
Random write performance
./fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
(On MacOS we must use posixaio ioengine. If you are on running some different flavour of Unix just replace –ioengine=posixaio with eg. –ioengine=libaio for Ubuntu)
5. The results
The lady at the store was right! Using same HDD’s the cheaper HDD enclosure gave us better results. It was faster by almost 35%.
tray | read mb/s | write mb/s | read IOPS | write IOPS | |
ASMT (/dev/disk) | 10.9MiB/s | 11.9MiB/s | 86 IOPS | 94 IOPS | |
ASMT | 69.7MiB/s | 72.8MiB/s | 552 IOPS | 576 IOPS | |
PATRIOT | 92.4MiB/s | 93.5MiB/s | 738 IOPS | 747 IOPS |
If you are interested in values I got, here there are.
The first set of benchmarks (done on buffered /dev/disk device) revealed really poor performance [r=10.9MiB/s,w=11.9MiB/s][r=86,w=94 IOPS].
sudo fio --filename=/dev/disk2 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest fio-2.18 Starting 1 thread ^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.5%][r=10.9MiB/s,w=11.9MiB/s][r=86,w=94 IOPS][eta 23h:52m:35s] fio: terminating on signal 2 benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 20:14:55 2017 read: IOPS=94, BW=11.8MiB/s (12.4MB/s)(5234MiB/445379msec) slat (usec): min=0, max=303, avg= 0.40, stdev= 2.28 clat (msec): min=47, max=228, avg=100.40, stdev=14.81 lat (msec): min=47, max=228, avg=100.40, stdev=14.81 clat percentiles (msec): | 1.00th=[ 74], 5.00th=[ 82], 10.00th=[ 85], 20.00th=[ 90], | 30.00th=[ 93], 40.00th=[ 96], 50.00th=[ 98], 60.00th=[ 102], | 70.00th=[ 105], 80.00th=[ 111], 90.00th=[ 119], 95.00th=[ 127], | 99.00th=[ 151], 99.50th=[ 161], 99.90th=[ 184], 99.95th=[ 192], | 99.99th=[ 208] write: IOPS=94, BW=11.8MiB/s (12.4MB/s)(5237MiB/445379msec) slat (usec): min=0, max=296, avg= 0.53, stdev= 2.81 clat (msec): min=25, max=177, avg=69.66, stdev= 9.52 lat (msec): min=25, max=177, avg=69.66, stdev= 9.52 clat percentiles (msec): | 1.00th=[ 51], 5.00th=[ 58], 10.00th=[ 61], 20.00th=[ 63], | 30.00th=[ 66], 40.00th=[ 68], 50.00th=[ 69], 60.00th=[ 71], | 70.00th=[ 73], 80.00th=[ 76], 90.00th=[ 80], 95.00th=[ 86], | 99.00th=[ 105], 99.50th=[ 114], 99.90th=[ 133], 99.95th=[ 137], | 99.99th=[ 151] lat (msec) : 50=0.44%, 100=76.81%, 250=22.76% cpu : usr=0.46%, sys=0.41%, ctx=283619, majf=3, minf=6 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=98.3%, 8=1.7%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=41875,41894,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=11.8MiB/s (12.4MB/s), 11.8MiB/s-11.8MiB/s (12.4MB/s-12.4MB/s), io=5234MiB (5489MB), run=445379-445379msec WRITE: bw=11.8MiB/s (12.4MB/s), 11.8MiB/s-11.8MiB/s (12.4MB/s-12.4MB/s), io=5237MiB (5491MB), run=445379-445379msec
Repeated benchmark on same enclosure, but using raw device (/dev/rdisk) revealed much nicer numbers – 600% faster than buffered device
[m(1)][0.3%][r=69.7MiB/s,w=72.8MiB/s][r=552,w=576 IOPS][eta 23h:55m:54s]
sudo fio --filename=/dev/rdisk2 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest fio-2.18 Starting 1 thread ^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.3%][r=69.7MiB/s,w=72.8MiB/s][r=552,w=576 IOPS][eta 23h:55m:54s] fio: terminating on signal 2 benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 21:13:39 2017 read: IOPS=538, BW=67.3MiB/s (70.6MB/s)(16.2GiB/245308msec) slat (usec): min=0, max=47, avg= 0.45, stdev= 1.02 clat (msec): min=8, max=45, avg=15.05, stdev= 2.70 lat (msec): min=8, max=45, avg=15.05, stdev= 2.70 clat percentiles (usec): | 1.00th=[11200], 5.00th=[12224], 10.00th=[12736], 20.00th=[13376], | 30.00th=[13888], 40.00th=[14400], 50.00th=[14784], 60.00th=[15168], | 70.00th=[15680], 80.00th=[16320], 90.00th=[17280], 95.00th=[18048], | 99.00th=[23936], 99.50th=[36608], 99.90th=[39680], 99.95th=[40192], | 99.99th=[42240] write: IOPS=538, BW=67.4MiB/s (70.7MB/s)(16.2GiB/245308msec) slat (usec): min=0, max=65, avg= 0.46, stdev= 0.67 clat (msec): min=6, max=45, avg=14.56, stdev= 2.71 lat (msec): min=6, max=45, avg=14.57, stdev= 2.71 clat percentiles (usec): | 1.00th=[10560], 5.00th=[11712], 10.00th=[12224], 20.00th=[12864], | 30.00th=[13376], 40.00th=[13888], 50.00th=[14272], 60.00th=[14784], | 70.00th=[15168], 80.00th=[15808], 90.00th=[16768], 95.00th=[17536], | 99.00th=[23680], 99.50th=[36096], 99.90th=[39168], 99.95th=[40192], | 99.99th=[42240] lat (msec) : 10=0.22%, 20=98.34%, 50=1.44% cpu : usr=3.48%, sys=2.40%, ctx=531264, majf=3, minf=5 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.9%, 8=1.8%, 16=0.3%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=132027,132160,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=67.3MiB/s (70.6MB/s), 67.3MiB/s-67.3MiB/s (70.6MB/s-70.6MB/s), io=16.2GiB (17.4GB), run=245308-245308msec WRITE: bw=67.4MiB/s (70.7MB/s), 67.4MiB/s-67.4MiB/s (70.7MB/s-70.7MB/s), io=16.2GiB (17.4GB), run=245308-245308msec
Finally, the second HDD tray I benchmarked revealed best results, almost 35% faster than cheap-enclosure-1.
[m(1)][0.5%][r=92.4MiB/s,w=93.5MiB/s][r=738,w=747 IOPS][eta 23h:52m:50s]
sudo fio --filename=/dev/rdisk3 --direct=1 --rw=randrw --rwmixwrite=50 --refill_buffers --norandommap --randrepeat=0 --ioengine=posixaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 --time_based --runtime=86400 --group_reporting --name=benchtest fio-2.18 Starting 1 thread ^Cbs: 1 (f=1), 0-2560 IOPS: [m(1)][0.5%][r=92.4MiB/s,w=93.5MiB/s][r=738,w=747 IOPS][eta 23h:52m:50s] fio: terminating on signal 2 benchtest: (groupid=0, jobs=1): err= 0: pid=3075: Fri Mar 24 20:37:26 2017 read: IOPS=761, BW=95.2MiB/s (99.8MB/s)(39.2GiB/430198msec) slat (usec): min=0, max=310, avg= 0.55, stdev= 2.23 clat (msec): min=1, max=48, avg=11.43, stdev= 2.84 lat (msec): min=1, max=48, avg=11.43, stdev= 2.84 clat percentiles (usec): | 1.00th=[ 6880], 5.00th=[ 8256], 10.00th=[ 8896], 20.00th=[ 9536], | 30.00th=[10048], 40.00th=[10560], 50.00th=[11072], 60.00th=[11584], | 70.00th=[12224], 80.00th=[12864], 90.00th=[14016], 95.00th=[15296], | 99.00th=[22912], 99.50th=[28800], 99.90th=[35584], 99.95th=[37120], | 99.99th=[40704] write: IOPS=762, BW=95.3MiB/s (99.9MB/s)(40.3GiB/430198msec) slat (usec): min=0, max=767, avg= 0.96, stdev= 3.58 clat (usec): min=492, max=45310, avg=9422.63, stdev=2869.71 lat (usec): min=493, max=45311, avg=9423.59, stdev=2869.68 clat percentiles (usec): | 1.00th=[ 5024], 5.00th=[ 6240], 10.00th=[ 6944], 20.00th=[ 7712], | 30.00th=[ 8256], 40.00th=[ 8640], 50.00th=[ 9024], 60.00th=[ 9536], | 70.00th=[10048], 80.00th=[10688], 90.00th=[11712], 95.00th=[13120], | 99.00th=[21888], 99.50th=[27264], 99.90th=[35072], 99.95th=[37120], | 99.99th=[40704] lat (usec) : 500=0.01% lat (msec) : 2=0.01%, 4=0.08%, 10=49.48%, 20=49.08%, 50=1.35% cpu : usr=4.59%, sys=2.86%, ctx=1256049, majf=0, minf=11 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=57.4%, 16=42.6%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=98.2%, 8=1.8%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=327551,327861,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=95.2MiB/s (99.8MB/s), 95.2MiB/s-95.2MiB/s (99.8MB/s-99.8MB/s), io=39.2GiB (42.1GB), run=430198-430198msec WRITE: bw=95.3MiB/s (99.9MB/s), 95.3MiB/s-95.3MiB/s (99.9MB/s-99.9MB/s), io=40.3GiB (42.1GB), run=430198-430198msec
Conclusion
fio is pretty robust utility for io testing. Beware of quality of onboard electronics when buying HDD trays. Trays within same price range, can vary 15-30% in speed.