在半定期的基础上,我看到GCE实例冻结了以下错误消息(来自串行控制台):
g[1375589.784755] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 g[1375589.786206] IP: [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0 g[1375589.787341] PGD 5da04067 PUD db83067 PMD 0 g[1375589.788607] Oops: 0000 [#1] SMP g[1375589.788705] Modules linked in: veth xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) softdog crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 processor psmouse parport_pc parport i2c_piix4 i2c_core thermal_sys lrw virtio_net evdev pcspkr serio_raw gf128mul glue_helper ablk_helper cryptd button ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common virtio_scsi scsi_mod virtio_pci virtio virtio_ring g[1375589.788705] cpu: 1 PID: 1515 Comm: docker Tainted: G C 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 g[1375589.788705] Hardware name: Google Google,BIOS Google 01/01/2011 g[1375589.788705] task: ffff88006fffc110 ti: ffff880003ac4000 task.ti: ffff880003ac4000 g[1375589.788705] RIP: 0010:[<ffffffff810a67d9>] [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0 g[1375589.788705] RSP: 0018:ffff880003ac7e30 EFLAGS: 00010002 g[1375589.788705] RAX: 0000000000000001 RBX: ffff880073112ec0 RCX: 0000000000000002 g[1375589.788705] RDX: 0000000000000001 RSI: ffff880009156d20 RDI: ffff880073112f38 g[1375589.788705] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 g[1375589.788705] R10: ffffffffffffffe0 R11: 0000000000000000 R12: ffff88006d2dcd00 g[1375589.788705] R13: ffff88006fffc110 R14: 0000000000000000 R15: 0000000000000000 g[1375589.788705] FS: 000000000323a880(0063) GS:ffff880073100000(0000) knlGS:0000000000000000 g[1375589.788705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 g[1375589.788705] CR2: 0000000000000078 CR3: 0000000034bff000 CR4: 00000000000406e0 g[1375589.788705] Stack: g[1375589.788705] 0000000000000000 ffffffff00000000 ffff88000000006e ffff880073112ec0 g[1375589.788705] ffff8800091573a4 0000000000000286 0000000000012ec0 ffff880073112ec0 g[1375589.788705] 0000000000000002 ffffffff8109cef4 ffff880009156d20 ffffffff810a01a4 g[1375589.788705] Call Trace: g[1375589.788705] [<ffffffff8109cef4>] ? check_preempt_curr+0x84/0xa0 g[1375589.788705] [<ffffffff810a01a4>] ? wake_up_new_task+0xf4/0x1b0 g[1375589.788705] [<ffffffff8118516d>] ? mprotect_fixup+0x15d/0x250 g[1375589.788705] [<ffffffff8106d10f>] ? do_fork+0xcf/0x340 g[1375589.788705] [<ffffffff8154b779>] ? stub_clone+0x69/0x90 g[1375589.788705] [<ffffffff8154b40d>] ? system_call_fast_compare_end+0x10/0x15 g[1375589.788705] Code: 00 00 83 e8 01 4d 8b 64 24 70 39 d0 7f f4 48 8b 7d 78 49 3b 7c 24 78 74 1d 66 0f 1f 84 00 00 00 00 00 48 8b 6d 70 4d 8b 64 24 70 <48> 8b 7d 78 49 3b 7c 24 78 75 ec 48 85 ff 74 e7 e8 f2 f9 ff ff g[1375589.788705] RIP [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0 g[1375589.788705] RSP <ffff880003ac7e30> g[1375589.788705] CR2: 0000000000000078 g[1375589.788705] ---[ end trace 5fab7713cb2d171f ]---
我能够恢复它们的唯一方法是登录Web界面并手动重置它们.不用说,它不会扩展.
我已经尝试过设置看门狗设备并设置kernel.panic = 10,理论上应该重新启动VM.
对于这些虚拟机,我使用’container-vm’作为操作系统风格(即Debian预装了Docker或多或少).
有没有人见过这个?