运行大约18个小时后,该系统使用~10GB的内存,导致我们执行常规任务时触发OOM杀手:
# free -h total used free shared buffers cached Mem: 14G 9.4G 5.3G 400K 27M 59M -/+ buffers/cache: 9.3G 5.4G Swap: 0B 0B 0B # cat /proc/meminfo MemTotal: 15400928 kB MemFree: 5567028 kB Buffers: 28464 kB Cached: 60816 kB SwapCached: 0 kB Active: 321464 kB Inactive: 59156 kB Active(anon): 291464 kB Inactive(anon): 316 kB Active(file): 30000 kB Inactive(file): 58840 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 40 kB Writeback: 0 kB AnonPages: 291380 kB Mapped: 14356 kB Shmem: 400 kB Slab: 364596 kB SReclaimable: 18856 kB SUnreclaim: 345740 kB KernelStack: 1832 kB PageTables: 3720 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7700464 kB Committed_AS: 313224 kB VmallocTotal: 34359738367 kB VmallocUsed: 35976 kB VmallocChunk: 34359678732 kB HardwareCorrupted: 0 kB AnonHugePages: 231424 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 9598976 kB DirectMap2M: 6260736 kB
但是,进程似乎没有使用大量的内存:
# top -o %MEM -n 1 top - 15:07:00 up 18:28,1 user,load average: 0.00,0.01,0.05 Tasks: 155 total,1 running,154 sleeping,0 stopped,0 zombie %cpu(s): 23.7 us,4.8 sy,0.0 ni,71.4 id,0.0 wa,0.0 hi,0.1 si,0.0 st KiB Mem: 15400928 total,9838560 used,5562368 free,29764 buffers KiB Swap: 0 total,0 used,0 free. 62760 cached Mem PID USER PR NI VIRT RES SHR S %cpu %MEM TIME+ COMMAND 1333 root 20 0 5763204 274132 5352 S 0.0 1.8 7:00.19 java 1466 newrelic 20 0 251484 4884 2056 S 0.0 0.0 0:56.41 nrsysmond 16804 root 20 0 105636 4212 3224 S 0.0 0.0 0:00.00 sshd 16876 root 20 0 21420 3908 1764 S 0.0 0.0 0:00.03 bash 16858 ubuntu 20 0 21456 3828 1684 S 0.0 0.0 0:00.05 bash 770 root 20 0 10216 2868 576 S 0.0 0.0 0:00.02 dhclient 1 root 20 0 33700 2216 624 S 0.0 0.0 0:35.50 init 16875 root 20 0 63664 2084 1612 S 0.0 0.0 0:00.00 sudo 16857 ubuntu 20 0 105636 1860 880 S 0.0 0.0 0:00.01 sshd 16920 root 20 0 23688 1528 1064 R 0.0 0.0 0:00.00 top 16803 postfix 20 0 27400 1492 1216 S 0.0 0.0 0:00.00 pickup 976 root 20 0 43444 1100 748 S 0.0 0.0 0:00.00 systemd-logind 572 root 20 0 51480 1048 308 S 0.0 0.0 0:00.53 systemd-udevd 1840 ntp 20 0 31448 1044 448 S 0.0 0.0 0:02.94 ntpd 990 syslog 20 0 255836 924 76 S 0.0 0.0 0:00.13 rsyslogd 1167 root 20 0 61372 828 148 S 0.0 0.0 0:00.00 sshd 945 message+ 20 0 39212 788 416 S 0.0 0.0 0:00.12 dbus-daemon 1323 root 20 0 20692 676 0 S 0.0 0.0 0:40.92 wrapper 1230 root 20 0 19320 588 244 S 0.0 0.0 0:04.57 irqbalance 1538 root 20 0 25336 500 188 S 0.0 0.0 0:00.18 master 567 root 20 0 19604 480 96 S 0.0 0.0 0:00.34 upstart-udev-br 1175 root 20 0 23648 404 156 S 0.0 0.0 0:00.08 cron 1005 root 20 0 15272 348 88 S 0.0 0.0 0:00.08 upstart-file-br
临时和共享内存文件系统基本上是空的:
# df -h Filesystem Size Used Avail Use% Mounted on udev 7.4G 12K 7.4G 1% /dev tmpfs 1.5G 384K 1.5G 1% /run /dev/xvda1 9.8G 6.7G 2.7G 72% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 7.4G 0 7.4G 0% /run/shm none 100M 0 100M 0% /run/user /dev/xvda15 104M 4.7M 99M 5% /boot/efi /dev/xvdb 64G 1.1G 60G 2% /mnt
smem说内核正在使用它:
# smem -tw Area Used Cache Noncache firmware/hardware 0 0 0 kernel image 0 0 0 kernel dynamic memory 9525544 92468 9433076 userspace memory 311064 15648 295416 free memory 5564320 5564320 0 ---------------------------------------------------------- 15400928 5672436 9728492
但slabtop没有帮助:
# slabtop -o -s c Active / Total Objects (% used) : 2915263 / 2937006 (99.3%) Active / Total Slabs (% used) : 60745 / 60745 (100.0%) Active / Total Caches (% used) : 68 / 103 (66.0%) Active / Total Size (% used) : 356086.71K / 360884.30K (98.7%) Minimum / Average / Maximum Object : 0.01K / 0.12K / 14.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 2226784 2226784 100% 0.07K 39764 56 159056K Acpi-ParseExt 273408 272598 99% 0.25K 8544 32 68352K kmalloc-256 8568 8560 99% 4.00K 1071 8 34272K kmalloc-4096 52320 52320 100% 0.50K 1635 32 26160K kmalloc-512 1988 1975 99% 8.00K 497 4 15904K kmalloc-8192 58044 53370 91% 0.19K 2764 21 11056K kmalloc-192 150016 141356 94% 0.06K 2344 64 9376K kmalloc-64 5016 3504 69% 0.96K 152 33 4864K ext4_inode_cache 7280 6834 93% 0.57K 260 28 4160K inode_cache 20265 20067 99% 0.19K 965 21 3860K dentry 1760 1721 97% 2.00K 110 16 3520K kmalloc-2048 19800 19800 100% 0.11K 550 36 2200K sysfs_dir_cache 2112 1966 93% 1.00K 66 32 2112K kmalloc-1024 305 260 85% 6.00K 61 5 1952K task_struct 14616 14242 97% 0.09K 348 42 1392K kmalloc-96 2125 2092 98% 0.63K 85 25 1360K proc_inode_cache 2324 2324 100% 0.55K 83 28 1328K radix_tree_node 9828 9828 100% 0.10K 252 39 1008K buffer_head 1400 1400 100% 0.62K 56 25 896K sock_inode_cache 54 39 72% 12.00K 27 2 864K nvidia_stack_cache 975 975 100% 0.81K 25 39 800K task_xstate 690 515 74% 1.06K 23 30 736K signal_cache
解决方法
我正在运行一个32GB内存的盒子,突出的区别是DirectMap4k值;
DirectMap4k: 493076 kB DirectMap2M: 7862272 kB DirectMap1G: 27262976 kB
与你的;
DirectMap4k: 11182080 kB DirectMap2M: 4677632 kB
这可能是一个起点.. Googling suggests这个值可能会受到来自主机的VPS分配的影响…你是在虚拟服务器上运行这台机器吗?
可能是主机服务器没有足够的RAM并且正在搞乱/ proc / meminfo的输出.
另外,我会粘贴smem -tw的输出,因为这可能决定内核或应用程序中的内存泄漏;
# smem -tw Area Used Cache Noncache firmware/hardware 0 0 0 kernel image 0 0 0 kernel dynamic memory 11297432 10738716 558716 userspace memory 6144832 1182184 4962648 free memory 15470032 15470032 0 ---------------------------------------------------------- 32912296 27390932 5521364