如果我在一些随机目录中使用许多文件或单个大文件tar -pcvf files.tar / var / log,则 mysql会被完全锁定,并且所有MysqL连接都会在tar运行时耗尽.
我的Nginx error.log被填满了
2011/04/01 04:29:11 [error] 15089#0: *39023131 recv() Failed (104: Connection reset by peer) while reading response header from upstream,client: xxx.xxx.xxx.xxx,server: www.domain.com,request: "GET /some.html HTTP/1.1",upstream: "fastcgi://unix:/var/run/PHP-fpm.sock:",host: "www.domain.com",referrer: "http://www.domain.com/some-other.html"
如果我跑,我会看到许多锁定的连接
SHOW PROCESSLIST;
我的服务器有4个cpu,8个内核(32个内核,64个线程)和64GB内存.
它在RAID 10中有6个SSD磁盘.
Top显示了1个用于tar的核心上的100%cpu,但是在tar完成之后,MysqL cpu使用跳跃到600%以上一两秒.
top - 04:48:29 up 37 days,14:17,4 users,load average: 3.82,1.37,0.99 Tasks: 1035 total,1 running,1034 sleeping,0 stopped,0 zombie cpu(s): 3.4%us,7.4%sy,0.0%ni,89.1%id,0.0%wa,0.0%hi,0.1%si,0.0%st Mem: 65980076k total,43154916k used,22825160k free,523560k buffers Swap: 1052248k total,0k used,1052248k free,37479984k cached PID USER PR NI VIRT RES SHR S %cpu %MEM TIME+ COMMAND 9325 MysqL 15 0 7624m 2.3g 4700 S 606.3 3.6 6861:35 MysqLd
> MysqL版本是5.1.56
> Linux 2.6.18-238.1.1.el5#1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU / Linux
> MysqL启用了binlog
my.cnf根据tuning-primer和MysqLtuner建议进行了优化,没有任何警告. (由于焦油问题导致连接超出范围)
[MysqLd] server-id = 100 datadir = /var/lib/MysqL port = 3306 socket = /var/lib/MysqL/MysqL.sock log-error = /var/log/MysqL/MysqL.err log-bin = /var/log/MysqL/MysqL-bin log-bin-index = /var/log/MysqL/MysqL-bin.index expire_logs_days = 2 sync_binlog = 1 skip-external-locking skip-innodb slow_query_log = 1 slow_query_log_file = /var/log/MysqL/slow_query.log long_query_time = 10 max_connections = 768 key_buffer = 6G table_cache = 15360 read_buffer_size = 2M read_rnd_buffer_size = 2M sort_buffer_size = 1M tmp_table_size = 128M max_heap_table_size = 128M max_allowed_packet = 16M bulk_insert_buffer_size = 16M myisam_sort_buffer_size = 128M thread_cache_size = 64 join_buffer_size = 1M
我尝试了一些其他压缩工具,如pigz和gzip,一切正常.
pigz是多线程的,因此它最大限度地利用了所有核心.顶部显示超过3000%的cpu使用,如果我运行它和MysqL运行没有问题 – 没有一个查询或表锁.
无论如何,我不知道这是tar还是MysqL问题以及如何解决它.我将不胜感激任何帮助.
对不起我的英语不好 :)
谢谢!
编辑:
焦油期间最高的iostat 2
avg-cpu: %user %nice %system %iowait %steal %idle 0.20 0.00 1.31 7.81 0.00 90.68 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 1179.00 308.00 452244.00 616 904488 sda1 0.00 0.00 0.00 0 0 sda2 1179.00 308.00 452244.00 616 904488 sda3 0.00 0.00 0.00 0 0
焦油中最高的顶部
top - 05:26:07 up 37 days,14:55,load average: 2.45,1.70,1.07 Tasks: 1045 total,2 running,1043 sleeping,0 zombie cpu(s): 0.1%us,1.7%sy,91.7%id,6.4%wa,39148160k used,26831916k free,488752k buffers Swap: 1052248k total,33484548k cached PID USER PR NI VIRT RES SHR S %cpu %MEM TIME+ COMMAND 27604 root 25 0 76192 1072 896 R 99.5 0.0 0:23.94 tar
焦油期间最高的vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 5 0 21973424 474068 37700200 0 0 1 19 0 0 1 0 99 0 0
焦油期间最高的平板
Active / Total Objects (% used) : 9150253 / 12383252 (73.9%) Active / Total Slabs (% used) : 452818 / 453490 (99.9%) Active / Total Caches (% used) : 105 / 149 (70.5%) Active / Total Size (% used) : 1359015.74K / 1709422.53K (79.5%) Minimum / Average / Maximum Object : 0.02K / 0.14K / 128.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 8161880 5170966 63% 0.09K 204047 40 816188K buffer_head 2796624 2795723 99% 0.21K 155368 18 621472K dentry_cache 295320 292658 99% 0.09K 7383 40 29532K journal_head 294665 215031 72% 0.52K 42095 7 168380K radix_tree_node 136800 136770 99% 0.02K 950 144 3800K avtab_node 132192 86357 65% 0.08K 2754 48 11016K selinux_inode_security 127680 119472 93% 0.03K 1140 112 4560K size-32 74565 69314 92% 0.74K 14913 5 59652K ext3_inode_cache 64320 40789 63% 0.12K 2144 30 8576K inet_peer_cache 59972 55193 92% 0.17K 2726 22 10904K vm_area_struct
cat / proc / mdstat的输出
Personalities : unused devices: <none>
mount的输出
/dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
输出为df -i
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda2 46497792 144610 46353182 1% / /dev/sda1 26104 46 26058 1% /boot tmpfs 8247509 1 8247508 1% /dev/shm
解决方法
> HP DL180-G6近线服务器
> 4x 300 GB SAS 15k驱动器
> 2x 1TB SATA 10k驱动器
> 2x Xeon 5340 2.53 GHz cpu(总共8个核心)
> 32 GB DDR3 1066 MHz
> HP Storageworks HBA P410(PCI Express – 1适用于所有HDD)
> HP Storageworks HBA P212 / Zero(PCI Express – 1用于外置磁带机)
> HP Ultrium LTO 4外置SAS磁带机(800/1600 MB)
当我们使用tar -options -source从/ mnt / backup -destination运行每日磁带备份到/ dev / st0(磁带)时,它基本上会锁定整个该死的计算机.第一个受到影响的服务是MysqL,它通过Unix文件系统套接字(/var/lib/MysqL/MysqL.sock)无法访问,然后进程将逐个崩溃.甚至终端(bash提示符)都是不可用的,忘了在gui(Gnome桌面)中打开任何东西.
解决方案不是使用’nice’,而是使用’ionice’.这不是cpu加载问题,而是磁盘加载.磁盘和处理器足够快,但主干(硬盘适配器/ PCI-express总线/等)无法跟上.
所以,这是修复……
旧tar备份命令:
[root@somewhere]# /bin/tar -clpzvf /dev/st0 /mnt/backup
新的tar备份命令:
[root@somewhere]# /usr/bin/ionice -c2 -n5 /bin/tar -clpzvf /dev/st0 /mnt/backup
供您参考,这里是’iowait’命令的联机帮助页……内核2.6.13和更新版本支持它:
– http://linux.die.net/man/1/ionice
– 如果你试图减慢某些东西而不让它永远消失,那么第2类系统的电离优先级的“理智”值在3到5之间.其中3为适度减速,5为非常慢.
有效地将运行磁带备份所需的时间增加了一倍(从半小时开始,现在大约需要一个小时),但是谁在乎,它现在正在按需运行.