我有一台持续崩溃的服务器.
我知道服务器崩溃有几个原因.
但是,如果原因是系统在崩溃之前耗尽了RAM;
我应该如何确认原因?我应该看一下哪些日志文件?我应该寻找什么行/错误信息?
我正在运行CentOS.大量使用PHP解析xml文件最多2千兆字节.
服务器有16GB RAM.
我知道服务器崩溃有几个原因.
但是,如果原因是系统在崩溃之前耗尽了RAM;
我应该如何确认原因?我应该看一下哪些日志文件?我应该寻找什么行/错误信息?
我正在运行CentOS.大量使用PHP解析xml文件最多2千兆字节.
服务器有16GB RAM.
编辑1
[root@61540 ~]# free -m total used free shared buffers cached Mem: 16035 1526 14509 0 40 1002 -/+ buffers/cache: 483 15552 Swap: 8197 0 8197
编辑2
在/ var / log / messages中
Feb 17 20:38:26 61540 syslogd 1.4.1: restart. Feb 17 20:38:26 61540 proftpd[3896]: 66.90.101.85 - received SIGHUP -- master server reparsing configuration file Feb 17 22:23:06 61540 avahi-daemon[3984]: recvmsg(): Resource temporarily unavailable Feb 17 23:07:37 61540 proftpd[10620] - (Several lines of ftp session) Feb 18 23:03:48 61540 syslogd 1.4.1: restart. Feb 18 23:03:48 61540 kernel: klogd 1.4.1,log source = /proc/kmsg started. Feb 18 23:03:48 61540 kernel: Linux version 2.6.18-308.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Feb 21 20:06:06 EST 2012 Feb 18 23:03:48 61540 kernel: Command line: ro root=LABEL=/ Feb 18 23:03:48 61540 kernel: BIOS-provided physical RAM map: Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000000010000 - 000000000009a000 (usable) Feb 18 23:03:48 61540 kernel: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000000100000 - 00000000cfda0000 (usable) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfda0000 - 00000000cfdd1000 (ACPI NVS) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfdd1000 - 00000000cfe00000 (ACPI data) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfe00000 - 00000000cff00000 (reserved) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000100000000 - 000000042f000000 (usable) Feb 18 23:03:48 61540 kernel: DMI 2.4 present. Feb 18 23:03:48 61540 kernel: No NUMA configuration found Feb 18 23:03:48 61540 kernel: Faking a node at 0000000000000000-000000042f000000 Feb 18 23:03:48 61540 kernel: Bootmem setup node 0 0000000000000000-000000042f000000 Feb 18 23:03:48 61540 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range Feb 18 23:03:48 61540 kernel: disabling kdump Feb 18 23:03:48 61540 kernel: ACPI: PM-Timer IO Port: 0x808 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Feb 18 23:03:48 61540 kernel: Processor #0 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Feb 18 23:03:48 61540 kernel: Processor #1 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) Feb 18 23:03:48 61540 kernel: Processor #2 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) Feb 18 23:03:48 61540 kernel: Processor #3 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) Feb 18 23:03:48 61540 kernel: Processor #4 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) Feb 18 23:03:48 61540 kernel: Processor #5 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) Feb 18 23:03:48 61540 kernel: Processor #6 5:1 APIC version 16 Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) Feb 18 23:03:48 61540 kernel: Processor #7 5:1 APIC version 16
解决方法
你应该检查/ var / log / messages
在这种情况下,dmesg命令不会有用,因为它只显示自上次引导以来的内核消息.
在这种情况下,dmesg命令不会有用,因为它只显示自上次引导以来的内核消息.
“耗尽内存”通常不足以彻底崩溃Linux.当内存不足时,Linux将开始终止进程(OOM killer).所以你可能会寻找一些kernel panic.如果你用较少的时间来阅读日志,你可以按/键搜索.
但最重要的是:你应该首先阅读/ var / log / messages.它按时间排序,因此很容易找到服务器上次启动的时刻.检查之前发生的事情,这会导致服务器崩溃.