我一直在努力解决这个不容易重现的问题.我使用的是
linux内核v3.1.0,有时候路由到几个IP地址不起作用.
似乎发生的是,内核不是将数据包发送到网关,而是将目标地址视为本地地址,并尝试通过ARP获取其MAC地址.
似乎发生的是,内核不是将数据包发送到网关,而是将目标地址视为本地地址,并尝试通过ARP获取其MAC地址.
例如,现在我当前的IP地址是172.16.1.104/24,网关是172.16.1.254:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:1B:63:97:FC:DC inet addr:172.16.1.104 Bcast:172.16.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:230772 errors:0 dropped:0 overruns:0 frame:0 TX packets:171013 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:191879370 (182.9 Mb) TX bytes:47173253 (44.9 Mb) Interrupt:17 # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.254 0.0.0.0 UG 0 0 0 eth0 172.16.1.0 0.0.0.0 255.255.255.0 U 1 0 0 eth0
我可以ping一些地址,但不能ping 172.16.0.59:
# ping -c1 172.16.1.254 PING 172.16.1.254 (172.16.1.254) 56(84) bytes of data. 64 bytes from 172.16.1.254: icmp_seq=1 ttl=64 time=0.383 ms --- 172.16.1.254 ping statistics --- 1 packets transmitted,1 received,0% packet loss,time 0ms rtt min/avg/max/mdev = 0.383/0.383/0.383/0.000 ms root@pozsybook:~# ping -c1 172.16.0.1 PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data. 64 bytes from 172.16.0.1: icmp_seq=1 ttl=63 time=5.54 ms --- 172.16.0.1 ping statistics --- 1 packets transmitted,time 0ms rtt min/avg/max/mdev = 5.545/5.545/5.545/0.000 ms root@pozsybook:~# ping -c1 172.16.0.2 PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data. 64 bytes from 172.16.0.2: icmp_seq=1 ttl=62 time=7.92 ms --- 172.16.0.2 ping statistics --- 1 packets transmitted,time 0ms rtt min/avg/max/mdev = 7.925/7.925/7.925/0.000 ms root@pozsybook:~# ping -c1 172.16.0.59 PING 172.16.0.59 (172.16.0.59) 56(84) bytes of data. From 172.16.1.104 icmp_seq=1 Destination Host Unreachable --- 172.16.0.59 ping statistics --- 1 packets transmitted,0 received,+1 errors,100% packet loss,time 0ms
当尝试ping 172.16.0.59时,我可以在tcpdump中看到已发送ARP请求:
# tcpdump -n -i eth0|grep ARP tcpdump: verbose output suppressed,use -v or -vv for full protocol decode listening on eth0,link-type EN10MB (Ethernet),capture size 96 bytes 15:25:16.671217 ARP,Request who-has 172.16.0.59 tell 172.16.1.104,length 28
和/ proc / net / arp有一个172.16.0.59的不完整条目:
# grep 172.16.0.59 /proc/net/arp 172.16.0.59 0x1 0x0 00:00:00:00:00:00 * eth0
请注意,172.16.0.59可通过此LAN从其他计算机访问.
有没有人知道发生了什么?谢谢.
>除了eth0和lo之外没有其他接口
>在另一端无法看到ARP请求,但这应该是如何工作的.主要问题是甚至不应该首先发送ARP请求
>即使我使用命令“route add -host 172.16.0.59 gw 172.16.1.254 dev eth0”添加显式路由,问题仍然存在
解决方法
它确实是一个Linux内核错误,可能是从版本2.6.39开始.我已将问题发布到lkml和netdev列表(请参阅
https://lkml.org/lkml/2011/11/18/191中的主题),它刚刚在
http://www.spinics.net/lists/netdev/msg179687.html的另一个netdev线程中讨论过
现在的解决方案是重启或刷新所有路由并等待10分钟以使icmp重定向到期.为了防止它再次发生,
echo 0 >/proc/sys/net/ipv4/conf/eth0/accept_redirects
帮助.