环境说明:
主机名 | 角色 | IP地址 | VIP |
heartbeat01.contoso.com | Heartbeat节点1 | eth0:192.168.49.133 eth1:172.16.49.133(心跳连接) |
172.16.49.100 |
heartbeat02.contoso.com | Heartbeat节点2 | eth0:192.168.49.134 eth1:172.16.49.134(心跳连接) |
一、准备工作
以下操作除非特别指明,否则均需在两台服务器上操作。
#关闭iptables防火墙并禁用SELinux /etc/init.d/iptablesstop chkconfigiptablesoff sed-i'/^SELINUX/s/enforcing/disabled/'/etc/selinux/config setenforce0 #设置时间同步 crontab-e#添加计划任务 0****/usr/sbin/ntpdate210.72.145.4464.147.116.229time.nist.gov :wq 或者 echo'0****/usr/sbin/ntpdate210.72.145.4464.147.116.229time.nist.gov'>>/var/spool/cron/root#添加计划任务 crontab-l#检查计划任务是否存在 0****/usr/sbin/ntpdate210.72.145.4464.147.116.229time.nist.gov #设置主机名(以heartbeat01为例,heartbeat02同样的方法) sed-i'/^HOSTNAME/s/^/#/'/etc/sysconfig/network sed-i'/#HOSTNAME/aHOSTNAME=heartbeat01.contoso.com'/etc/sysconfig/network grepHOSTNAME/etc/sysconfig/network hostnameheartbeat01.contoso.com 或者 sed-i'/^HOSTNAME/d'/etc/sysconfig/network echo'HOSTNAME=heartbeat01.contoso.com'>>/etc/sysconfig/network grepHOSTNAME/etc/sysconfig/network hostnameheartbeat01.contoso.com #编辑/etc/hosts文件 echo-e'192.168.49.133heartbeat01.contoso.com\n192.168.49.134heartbeat02.contoso.com'>>/etc/hosts tail-2/etc/hosts #添加一条主机路由 /sbin/routeadd-host172.16.49.134deveth1#在heartbeat01上配置 echo'/sbin/routeadd-host172.16.49.134deveth1'>>/etc/rc.local#在heartbeat01上配置 /sbin/routeadd-host172.16.49.133deveth1#在heartbeat02上配置 echo'/sbin/routeadd-host172.16.49.133deveth1'>>/etc/rc.local#在heartbeat02上配置 route-n#添加之后分别在heartbeat01和heartbeat02上检查
二、安装heartbeat软件
rpm-ivhhttp://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm yum-yinstallheartbeat*
三、编辑heartbeat配置文件
1)拷贝配置文件
cp/usr/share/doc/heartbeat-3.0.4/{ha.cf,haresources,authkeys}/etc/ha.d/ ll/etc/ha.d/ cd/etc/ha.d/
2)配置authkeys
[root@heartbeat01 ha.d]# egrep -v "#|^$" authkeys
auth 2
2 sha1 c6091592594cd14c
[root@heartbeat02 ha.d]# egrep -v "#|^$" authkeys
auth 2
2 sha1 c6091592594cd14c
# 两个节点的配置一致
3)配置ha.cf
下面以使用单播的方式为例,给出两个节点的配置:
[root@heartbeat01 ha.d]# egrep -v "#|^$" ha.cf
debugfile /var/log/ha-debug #设置debug文件位置
logfile /var/log/ha-log #设置日志文件位置
logfacility local1 #设置记录日志的设备
keepalive 2 #设置发送心跳报文的时间间隔
deadtime 30 #设置确认对端死亡的时间间隔
warntime 10 #设置发出最后的心跳警告报文的间隔
initdead 60 #设置初始化时间
ucast eth1 172.16.49.134 #设定侦听的心跳线的接口和对应的对端接口的IP地址
auto_failback on #启用自动恢复模式,当拥有该资源的属主恢复之后,属主将回收该资源
node heartbeat01.contoso.com #指定节点1,节点的名称一定要和uname -n的结果一致
node heartbeat02.contoso.com #指定节点2
ping 172.16.49.1 #指定第三方仲裁节点
respawn hacluster /usr/lib64/heartbeat/ipfail #使用这个脚本去侦听对方是否还活着(使用的是ICMP报文检测)
[root@heartbeat02 ha.d]# egrep -v "#|^$" ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local1
keepalive 2
deadtime 30
warntime 10
initdead 60
ucast eth1 172.16.49.133
auto_failback on
node heartbeat01.contoso.com
node heartbeat02.contoso.com
ping 172.16.49.1
respawn hacluster /usr/lib64/heartbeat/ipfail
# 两个节点的差别只有单播的对端IP不一样,其他都一样
4)配置haresources
echo 'heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1' >>/etc/ha.d/haresources
[root@heartbeat01 ha.d]# egrep -v "#|^$" haresources
heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
[root@heartbeat02 ha.d]# egrep -v "#|^$" haresources
heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
# 两个节点的配置一致
四、启动heartbeat并测试
/etc/init.d/heartbeatstart#分别在heartbeat01和heartbeat02上执行
到两个节点上分别查看VIP:
[root@heartbeat01 ha.d]# ip addr |grep 172.16.49
inet 172.16.49.133/24 brd 172.16.49.255 scope global eth1
inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
[root@heartbeat02 ha.d]# ip addr |grep 172.16.49
inet 172.16.49.134/24 brd 172.16.49.255 scope global eth1
然后,将heartbeat01上的heartbeat服务关闭,再进行查看:
[root@heartbeat01 ha.d]# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.
[root@heartbeat01 ha.d]# ip addr |grep 172.16.49
inet 172.16.49.133/24 brd 172.16.49.255 scope global eth1
[root@heartbeat02 ha.d]# ip addr |grep 172.16.49
inet 172.16.49.134/24 brd 172.16.49.255 scope global eth1
inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
可以看到,VIP已经从heartbeat01上转移到heartbeat02上了。
在VIP切换过程中,从另一台主机ping VIP地址,间断时间非常短暂。
五、检查日志
/var/log/ha-log
================================================
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Comm_now_up(): updating status to active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Local status now set to: 'active'
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,499)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7312]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498 gid 499 (pid 7312)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Status update for node heartbeat02.contoso.com: status active
harc(default)[7315]: 2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: [7312]: info: Asking other side for ping node count.
Sep 22 05:44:36 heartbeat01.contoso.com ipfail: [7312]: info: No giveup timer to abort.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: Initial resource acquisition complete (T_RESOURCES(us))
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7368]: 2016/09/22_05:44:37 INFO: Resource is stopped
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7332]: info: Local Resource acquisition completed.
harc(default)[7451]: 2016/09/22_05:44:37 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[7451]: 2016/09/22_05:44:37 received ip-request-resp IPaddr::172.16.49.100/24/eth1 OK yes
ResourceManager(default)[7474]: 2016/09/22_05:44:37 info: Acquiring resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7502]: 2016/09/22_05:44:37 INFO: Resource is stopped
ResourceManager(default)[7474]: 2016/09/22_05:44:37 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 start
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: Adding inet address 172.16.49.100/24 with broadcast address 172.16.49.255 to device eth1
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: Bringing device eth1 up
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.49.100 eth1 172.16.49.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7601]: 2016/09/22_05:44:38 INFO: Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7284]: info: Heartbeat shutdown in progress. (7284)
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7716]: info: Giving up all HA resources.
ResourceManager(default)[7729]: 2016/09/22_05:44:40 info: Releasing resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
ResourceManager(default)[7729]: 2016/09/22_05:44:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 stop
IPaddr(IPaddr_172.16.49.100)[7792]: 2016/09/22_05:44:40 INFO: IP status = ok,IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7766]: 2016/09/22_05:44:40 INFO: Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7716]: info: All HA resources relinquished.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: WARN: 1 lost packet(s) for [heartbeat02.contoso.com] [20:22]
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: info: No pkts missing from heartbeat02.contoso.com!
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: info: killing /usr/lib64/heartbeat/ipfail process group 7312 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBFIFO process 7288 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBWRITE process 7289 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBREAD process 7290 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBWRITE process 7291 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBREAD process 7292 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7292 exited. 5 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7289 exited. 4 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7290 exited. 3 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7291 exited. 2 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7288 exited. 1 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: heartbeat01.contoso.com Heartbeat shutdown complete.
/var/log/ha-debug
================================================
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: [7283]: info: **************************
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: [7283]: info: Configuration validated. Starting heartbeat 3.0.4
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: [7284]: info: heartbeat: version 3.0.4
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: Heartbeat generation: 1474533038
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: bound send socket to device: eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: set SO_REUSEPORT(w)
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: bound receive socket to device: eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: set SO_REUSEPORT(w)
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ucast: started on port 694 interface eth1 to 172.16.49.134
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: glib: ping heartbeat started.
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: Local status now set to: 'up'
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: Link 172.16.49.1:172.16.49.1 up.
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: [7284]: info: Status update for node 172.16.49.1: status ping
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Link heartbeat02.contoso.com:eth1 up.
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Status update for node heartbeat02.contoso.com: status up
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7294]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[7294]: 2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Comm_now_up(): updating status to active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: debug: get_delnodelist: delnodelist=
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7312]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498 gid 499 (pid 7312)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7284]: info: Status update for node heartbeat02.contoso.com: status active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: [7315]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[7315]: 2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:26 heartbeat01.contoso.com ipfail: [7312]: debug: PID=7312
Sep 22 05:44:26 heartbeat01.contoso.com ipfail: [7312]: debug: Signing in with heartbeat
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: [7312]: debug: [We are heartbeat01.contoso.com]
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: [7312]: debug: auto_failback -> 1 (on)
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: [7312]: debug: Setting message filter mode
Sep 22 05:44:28 heartbeat01.contoso.com ipfail: [7312]: debug: Starting node walk
Sep 22 05:44:29 heartbeat01.contoso.com ipfail: [7312]: debug: Cluster node: 172.16.49.1: status: ping
Sep 22 05:44:29 heartbeat01.contoso.com ipfail: [7312]: debug: Cluster node: heartbeat02.contoso.com: status: active
Sep 22 05:44:30 heartbeat01.contoso.com ipfail: [7312]: debug: [They are heartbeat02.contoso.com]
Sep 22 05:44:30 heartbeat01.contoso.com ipfail: [7312]: debug: Cluster node: heartbeat01.contoso.com: status: active
Sep 22 05:44:31 heartbeat01.contoso.com ipfail: [7312]: debug: Setting message signal
Sep 22 05:44:31 heartbeat01.contoso.com ipfail: [7312]: debug: Waiting for messages...
Sep 22 05:44:32 heartbeat01.contoso.com ipfail: [7312]: debug: Got join message from another ipfail client. (heartbeat02.contoso.com)
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: [7312]: debug: Found ping node 172.16.49.1!
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: [7312]: info: Asking other side for ping node count.
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: [7312]: debug: Message [num_ping] sent.
Sep 22 05:44:36 heartbeat01.contoso.com ipfail: [7312]: info: No giveup timer to abort.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: info: Initial resource acquisition complete (T_RESOURCES(us))
Sep 22 05:44:37 heartbeat01.contoso.com ipfail: [7312]: debug: Other side is now stable.
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7368]: 2016/09/22_05:44:37 INFO: Resource is stopped
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7332]: info: Local Resource acquisition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7284]: debug: StartNextRemoteRscReq(): child count 1
Sep 22 05:44:37 heartbeat01.contoso.com ipfail: [7312]: debug: Other side is now stable.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: [7451]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[7451]: 2016/09/22_05:44:37 info: received ip-request-resp IPaddr::172.16.49.100/24/eth1 OK yes
ResourceManager(default)[7474]: 2016/09/22_05:44:37 info: Acquiring resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7502]: 2016/09/22_05:44:37 INFO: Resource is stopped
ResourceManager(default)[7474]: 2016/09/22_05:44:37 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 start
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: Adding inet address 172.16.49.100/24 with broadcast address 172.16.49.255 to device eth1
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: Bringing device eth1 up
IPaddr(IPaddr_172.16.49.100)[7627]: 2016/09/22_05:44:38 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.49.100 eth1 172.16.49.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7601]: 2016/09/22_05:44:38 INFO: Success
INFO: Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7284]: info: Heartbeat shutdown in progress. (7284)
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7716]: info: Giving up all HA resources.
ResourceManager(default)[7729]: 2016/09/22_05:44:40 info: Releasing resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
ResourceManager(default)[7729]: 2016/09/22_05:44:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 stop
IPaddr(IPaddr_172.16.49.100)[7792]: 2016/09/22_05:44:40 INFO: IP status = ok,IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100)[7766]: 2016/09/22_05:44:40 INFO: Success
INFO: Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: [7716]: info: All HA resources relinquished.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: WARN: 1 lost packet(s) for [heartbeat02.contoso.com] [20:22]
Sep 22 05:44:41 heartbeat01.contoso.com ipfail: [7312]: debug: Other side is now stable.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: info: No pkts missing from heartbeat02.contoso.com!
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: [7284]: info: killing /usr/lib64/heartbeat/ipfail process group 7312 with signal 15
ARPING 172.16.49.100 from 172.16.49.100 eth1
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBFIFO process 7288 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBWRITE process 7289 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBREAD process 7290 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBWRITE process 7291 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: killing HBREAD process 7292 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7292 exited. 5 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7289 exited. 4 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7290 exited. 3 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7291 exited. 2 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: Core process 7288 exited. 1 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: [7284]: info: heartbeat01.contoso.com Heartbeat shutdown complete.