我有一个定制的Ubuntu 11.04服务器,带有6个磁盘软件RAID 10主驱动器.在它上面,我主要运行Postgresql和一些其他从Web传输数据的实用程序.经常在几小时的正常运行时间后,我发现服务器开始落后于所有类型的进程.例如,登录后可能需要10-15秒才能获得
shell提示符.顶部可能需要5-10秒才能出现.一个ls可能需要一两秒钟.
当我看顶部时,几乎没有cpu使用率. Postgresql服务器使用了相当多的内存,但还不足以流入swap.
我不知道从哪里开始,除了怀疑RAID10(我之前只有过软件RAID 1).
编辑:从顶部输出:
top - 11:56:03 up 1:46,3 users,load average: 0.89,0.73,0.72 Tasks: 119 total,1 running,118 sleeping,0 stopped,0 zombie cpu(s): 0.2%us,0.0%sy,0.0%ni,93.5%id,6.2%wa,0.0%hi,0.0%si,0.0%st Mem: 16325596k total,3478248k used,12847348k free,20880k buffers Swap: 19534176k total,0k used,19534176k free,3041992k cached PID USER PR NI VIRT RES SHR S %cpu %MEM TIME+ COMMAND 1747 woodsp 20 0 109m 10m 4888 S 1 0.1 0:42.70 python 357 root 20 0 0 0 0 S 0 0.0 0:00.40 jbd2/sda3-8 1 root 20 0 24324 2284 1344 S 0 0.0 0:00.84 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.24 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 10 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1 12 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/1 13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2 14 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/2:0 15 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 16 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/2 17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0 19 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/3 20 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/3 21 root 0 -20 0 0 0 S 0 0.0 0:00.00 cpuset 22 root 0 -20 0 0 0 S 0 0.0 0:00.00 khelper 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kdevtmpfs 24 root 0 -20 0 0 0 S 0 0.0 0:00.00 netns 26 root 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers
df -h
rpsharp@ncp-skookum:~$df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 1.8T 549G 1.2T 32% / udev 7.8G 4.0K 7.8G 1% /dev tmpfs 3.2G 492K 3.2G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.8G 0 7.8G 0% /run/shm /dev/sda2 952M 128K 952M 1% /boot/efi /dev/md0 5.5T 562G 4.7T 11% /usr/local
免费-m
psharp@ncp-skookum:~$free -m total used free shared buffers cached Mem: 15942 3409 12533 0 20 2983 -/+ buffers/cache: 405 15537 Swap: 19076 0 19076
tail -50 / var / log / syslog
Jul 3 06:31:32 ncp-skookum rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1070" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jul 3 06:39:01 ncp-skookum CRON[14211]: (root) CMD ( [ -x /usr/lib/PHP5/maxlifetime ] && [ -d /var/lib/PHP5 ] && find /var/lib/PHP5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/PHP5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 06:40:01 ncp-skookum CRON[14223]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:01 ncp-skookum CRON[14328]: (woodsp) CMD (/home/woodsp/bin/mail_tweetupdate # email an update) Jul 3 07:00:01 ncp-skookum CRON[14327]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:28 ncp-skookum sendmail[14356]: q63E0SoZ014356: from=woodsp,size=2328,class=0,nrcpts=2,msgid=<201207031400.q63E0SoZ014356@ncp-skookum.Stanford.EDU>,relay=woodsp@localhost Jul 3 07:00:29 ncp-skookum sm-mta[14357]: q63E0Si6014357: from=<woodsp@ncp-skookum.Stanford.EDU>,size=2569,proto=ESMTP,daemon=MTA-v4,relay=localhost [127.0.0.1] Jul 3 07:00:29 ncp-skookum sendmail[14356]: q63E0SoZ014356: to=Spencer Wood <woodsp@stanford.edu>,Martin Lacayo <mlacayo@stanford.edu>,ctladdr=woodsp (1004/1005),delay=00:00:01,xdelay=00:00:01,mailer=relay,pri=62328,relay=[127.0.0.1] [127.0.0.1],dsn=2.0.0,stat=Sent (q63E0Si6014357 Message accepted for delivery) Jul 3 07:00:29 ncp-skookum sm-mta[14359]: STARTTLS=client,relay=mx3.stanford.edu.,version=TLSv1/SSLv3,verify=FAIL,cipher=DHE-RSA-AES256-SHA,bits=256/256 Jul 3 07:00:29 ncp-skookum sm-mta[14359]: q63E0Si6014357: to=<mlacayo@stanford.edu>,<woodsp@stanford.edu>,ctladdr=<woodsp@ncp-skookum.Stanford.EDU> (1004/1005),xdelay=00:00:00,mailer=esmtp,pri=152569,relay=mx3.stanford.edu. [171.67.219.73],stat=Sent (Ok: queued as 8F3505802AC) Jul 3 07:09:08 ncp-skookum CRON[14396]: (root) CMD ( [ -x /usr/lib/PHP5/maxlifetime ] && [ -d /var/lib/PHP5 ] && find /var/lib/PHP5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/PHP5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:17:01 ncp-skookum CRON[14438]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 07:20:01 ncp-skookum CRON[14453]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:39:01 ncp-skookum CRON[14551]: (root) CMD ( [ -x /usr/lib/PHP5/maxlifetime ] && [ -d /var/lib/PHP5 ] && find /var/lib/PHP5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/PHP5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:40:01 ncp-skookum CRON[14562]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:00:01 ncp-skookum CRON[14668]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:09:01 ncp-skookum CRON[14724]: (root) CMD ( [ -x /usr/lib/PHP5/maxlifetime ] && [ -d /var/lib/PHP5 ] && find /var/lib/PHP5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/PHP5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:17:01 ncp-skookum CRON[14766]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 08:20:01 ncp-skookum CRON[14781]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:39:01 ncp-skookum CRON[14881]: (root) CMD ( [ -x /usr/lib/PHP5/maxlifetime ] && [ -d /var/lib/PHP5 ] && find /var/lib/PHP5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/PHP5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:40:01 ncp-skookum CRON[14892]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
输出hdparm -t / dev / sd {a,b,c,d,e,f}这看起来很可疑?
/dev/sda: Timing buffered disk reads: 2 MB in 4.84 seconds = 423.39 kB/sec /dev/sdb: Timing buffered disk reads: 420 MB in 3.01 seconds = 139.74 MB/sec /dev/sdc: Timing buffered disk reads: 390 MB in 3.00 seconds = 129.87 MB/sec /dev/sdd: Timing buffered disk reads: 416 MB in 3.00 seconds = 138.51 MB/sec /dev/sde: Timing buffered disk reads: 422 MB in 3.00 seconds = 140.50 MB/sec /dev/sdf: Timing buffered disk reads: 416 MB in 3.01 seconds = 138.26 MB/sec