Analysis for IO bottleneck in Unix
# sar -d
Linux 2.4.21-27.ELsmp (pw101) 12/04/2005
12:00:00 AM DEV tps rd_sec/s wr_sec/s
….
Average: dev8-128 7.16 5.37 75.07
Average: dev8-129 7.16 5.37 75.07
Average: dev8-130 0.00 0.00 0.00
….
The above command finds the busiest device. To determine what that device is, do:
# more /proc/devices
Character devices:
1 mem
2 pty
3 ttyp
4 ttyS
5 cua
7 vcs
…
Block devices:
1 ramdisk
2 fd
3 ide0
7 loop
8 sd
…
71 sd
129 sd
The above example indicates the ‘Block device’ 8 is ‘sd’ or scsi disk, and #129 is one of
scsi disks in the system. So ‘sar -d’ tells you that one of the scsi disks, or device
‘dev8-129’ is busy.
Now the following command is a bit busy, but you can always strip away piped commands to see
what output is like for each command. But the shown example give you which program (along with
its pid) has the most open files on a given busy device. In our example, the busy device we
are interested is ‘dev8-129’, but is represented as ‘8,129’ in the ‘lsof’ output.
# lsof | grep “8,129” | awk ‘{print $1″ “$2}’ | uniq -c | sort -n -r
[count p pid]
307 java 31916
307 java 31915
307 java 31914
307 java 31913
307 java 31912
307 java 31911
307 java 31910
307 java 31909
307 java 31908
307 java 31907
307 java 31906
307 java 31905
307 java 31904
307 java 31903
307 java 27645
55 db2sysc 32011
51 db2sysc 32012
46 httpd 32009
46 httpd 32008
46 httpd 32006
46 httpd 31901
46 httpd 31900
46 httpd 31899
46 httpd 31898
46 httpd 31897
46 httpd 31874
45 db2sysc 32019
40 db2fmp 31996
31 sshd 26713
29 db2sysc 32025
29 db2sysc 32024
…
The above output gives a clue as to which process(es) to look into for disk I/O problems
(i.e., ‘java’). The ‘java’ processes are having the largest number of open files, and
chances are doing the most disk I/O. Of course, some files maybe memory-cached, so this
method may not always work. I’d like to hear from you if you have a better solution. Please
contact me (see footer for contact info).