Minor Note on ‘dd’ Write Performance





5.00/5 (1 vote)
This is a minor note on 'dd' write performance
Today, I was cleaning out some old logical volumes. Since they resided on rented harddisks, I chose to overwrite them with zeroes to avoid leaving data tracks on someone else’s disks. The first thing that came to my mind was this:
dd if=/dev/zero of=/dev/vg/lv
Since I had ten logical volumes, I also ran ten instances of dd in parallel. They were on a RAID and I was decommissioning the server, so I didn’t really care about performance. Speaking of which, I like to spy on running processes, call my a techno-voyeur if you want!
Anyways, vmstat was telling me that the system was chugging along nicely, reading(sic!) and writing approximately 16MB/s each. Something was clearly wrong. Note the “bi
” and “bo
” columns, denoting kB/s read and written:
david@hetz:~$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
0 12 6416 270676 1601220 9628 0 0 16628 16553 4481 7638 1 21 26 52
I had forgotten to specify the block size which dd
should use to make the transfer. And if the program is writing the zeroes one by one into the target file, the destination has to be read before it is modified, since the kernel cannot (and should not!) guess that the rest of the block (which is all we’re talking about at this stage) will be overwritten too.
To test my hypotheses, I restarted the processes and told dd
to use nice 1MB sized blocks. I had ten threads to keep the hardware busy, so that shouldn’t make any problems.
dd if=/dev/zero of=/dev/vg/lv bs=1M
Really, in this configuration, the system stabilized around a nice 55MB/s writes. The kernel was able to recognize that the 1MB sized writes would cover complete blocks and that their content would be overwritten. No need to load them beforehand:
david@hetz:~$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
0 12 6416 16308 1749688 5312 0 0 0 55644 551 1576 0 23 1 76
While I was waiting for the last process to finish, I noticed that throughput had risen to 60MB/s:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
1 3 6416 15336 1750492 6088 0 0 4 62892 545 232 0 28 32 39
To summarize: having only one process running is ~10% faster than having ten processes running and using a non-trivial blocksize is more than three times faster than specifying none at all.