我用SSH测试了DNS循环,我注意到在我的测试环境中SSH客户端的惊人结果.我使用RHEL 6.2的3个节点(openssh-5.3p1,bind-9.7.3-8.P3).像主机键这样的东西已被管理.
我的“问题”:
我想在多个SSH服务器之间使用多个DNS条目进行基本的负载平衡.我(几乎)确信这是可能的.但是我得到了一个基本的HA …似乎openssh客户端不关心循环,它总是连接到同一节点,除非它是关闭的,在最后一种情况下客户端使用另一个记录来自DNS条目列表,然后成功连接到它.这是正常/普遍的行为吗?或者我的测试有什么问题?
我把我的straces和tcpdump放在几种情况下会发生什么.如果您有任何想法或解释可以帮助,请提前感谢:)
login => 10.255.254.1(node0),10.255.254.3(node2)
ssh client => 10.255.254.2(node1)
node0上的DNS服务器,RR尚未被禁用.
login IN A 10.255.254.1 login IN A 10.255.254.3
我确认:
>使用host(1)查找确认Round-Robin;
> ping(1)命令看起来不错:
[root @ node1~] #ping login
PING login.node (10.255.254.3) 56(84) bytes of data. 64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=1.73 ms ^C [root@node1 ~]# ping login PING login.node (10.255.254.1) 56(84) bytes of data. 64 bytes from node0.node (10.255.254.1): icmp_seq=1 ttl=64 time=0.467 ms ^C [root@node1 ~]# ping login PING login.node (10.255.254.3) 56(84) bytes of data. 64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=0.433 ms ^C
测试1(两个SSH服务器都已启动且可访问)
[root@node1 ~]# strace -e connect ssh login connect(3,{sa_family=AF_FILE,path="/var/run/nscd/socket"},110) = -1 ENOENT (No such file or directory) (...) connect(3,{sa_family=AF_INET,sin_port=htons(53),sin_addr=inet_addr("10.255.254.1")},16) = 0 connect(3,sin_port=htons(22),sin_addr=inet_addr("10.255.254.3")},{sa_family=AF_UNSPEC,sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"},16) = 0 (...) [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 listening on eth0,link-type EN10MB (Ethernet),capture size 65535 bytes 17:03:04.875099 IP node1.node.53511 > node0.node.domain: 55904+ A? login.node. (29) 17:03:04.875417 IP node0.node.domain > node1.node.53511: 55904* 2/1/1 A 10.255.254.3,A 10.255.254.1 (102) 17:03:04.875432 IP node1.node.53511 > node0.node.domain: 22271+ AAAA? login.node. (29) 17:03:04.875523 IP node0.node.domain > node1.node.53511: 22271* 0/1/0 (79)
=> node2上的连接(10.255.254.3)
测试2(两个SSH服务器仍然可以访问)
[root@node1 ~]# strace -e connect ssh login connect(3,16) = 0 (...) [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 17:04:29.663664 IP node1.node.51950 > node0.node.domain: 4685+ A? login.node. (29) 17:04:29.663685 IP node1.node.51950 > node0.node.domain: 36559+ AAAA? login.node. (29) 17:04:29.664046 IP node0.node.domain > node1.node.51950: 4685* 2/1/1 A 10.255.254.1,A 10.255.254.3 (102) 17:04:29.664110 IP node0.node.domain > node1.node.51950: 36559* 0/1/0 (79)
=> node2上的连接
(另一个测试再次确认与node2的连接.似乎循环仅用于ssh客户端的初步测试)
测试3(node2上的SSH服务器停止)
[root@node2 ~]# /etc/init.d/sshd stop Stopping sshd: [ OK ] [root@node1 ~]# strace -e connect ssh login connect(3,16) = -1 ECONNREFUSED (Connection refused) connect(3,16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 17:09:05.854022 IP node1.node.41233 > node0.node.domain: 63435+ A? login.node. (29) 17:09:05.854055 IP node1.node.41233 > node0.node.domain: 3015+ AAAA? login.node. (29) 17:09:05.854436 IP node0.node.domain > node1.node.41233: 63435* 2/1/1 A 10.255.254.1,A 10.255.254.3 (102) 17:09:05.854531 IP node0.node.domain > node1.node.41233: 3015* 0/1/0 (79) 17:09:05.856764 IP node1.node.59579 > node0.node.ssh: Flags [S],seq 3025023931,win 14600,options [mss 1460,sackOK,TS val 9854496 ecr 0,nop,wscale 7],length 0 17:09:05.856806 IP node0.node.ssh > node1.node.59579: Flags [S.],seq 1105519762,ack 3025023932,win 14480,TS val 350907197 ecr 9854496,length 0 17:09:05.857106 IP node1.node.59579 > node0.node.ssh: Flags [.],ack 1,win 115,options [nop,TS val 9854496 ecr 350907197],length 0 17:09:05.865291 IP node0.node.ssh > node1.node.59579: Flags [P.],seq 1:22,win 114,TS val 350907205 ecr 9854496],length 21 (...)
=> node0上的连接(故障转移??惊喜!)
测试4(相同条件)
[root@node1 ~]# strace -e connect ssh login connect(3,16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 (...) 17:11:44.154595 IP node1.node.56947 > node0.node.domain: 4602+ A? login.node. (29) 17:11:44.154862 IP node0.node.domain > node1.node.56947: 4602* 2/1/1 A 10.255.254.3,A 10.255.254.1 (102) (...)
=>相同的结果(node0上的连接)
测试5(重新启动node2上的SSH服务器)
[root@node2 ~]# /etc/init.d/sshd restart Stopping sshd: [Failed] Starting sshd: [ OK ] [root@node1 ~]# strace -e connect ssh login connect(3,16) = 0 [root@node0 ~]# tcpdump -i eth0 src node1 or dst node1 (...) 17:17:12.893633 IP node1.node.42432 > node0.node.domain: 7264+ A? login.node. (29) 17:17:12.893988 IP node0.node.domain > node1.node.42432: 7264* 2/1/1 A 10.255.254.1,A 10.255.254.3 (102) (...)
=>再次连接node2(故障回复)