corosync + pacemaker + postgres_streaming_replication
说明:
@H_301_9@该文档用于说明以@H_301_9@corosync+pacemaker@H_301_9@的方式实现@H_301_9@Postgresql@H_301_9@流复制自动切换。注意内容包括有关@H_301_9@corosync/pacemaker@H_301_9@知识总结以及整个环境的搭建过程和问题处理。
一、介绍
Corosync
@H_301_9@Corosync@H_301_9@是由@H_301_9@OpenAIS@H_301_9@项目分离独立出来的项目,分离后能实现@H_301_9@HA@H_301_9@信息传输功能的就成为了@H_301_9@Corosync@H_301_9@,因此@H_301_9@Corosync 60%@H_301_9@的代码来源于@H_301_9@OpenAIS@H_301_9@。
@H_301_9@Corosync@H_301_9@与分离出来的@H_301_9@Heartbeat@H_301_9@类似,都属于集群信息层,负责传送集群信息以及节点间的心跳信息,单纯@H_301_9@HA@H_301_9@软件都不存在管理资源的功能,而是需要依赖上层的@H_301_9@CRM@H_301_9@来管理资源。目前最著名的资源管理器为@H_301_9@Pacemaker@H_301_9@,@H_301_9@Corosync+Pacemaker@H_301_9@也成为高可用集群方案中的最佳组合。
@H_301_9@
Pacemaker
@H_301_9@Pacemaker@H_301_9@,即@H_301_9@Cluster Resource Manager@H_301_9@(@H_301_9@CRM@H_301_9@),管理整个@H_301_9@HA@H_301_9@,客户端通过@H_301_9@pacemaker@H_301_9@管理监控整个集群。
@H_301_9@常用的集群管理工具:
@H_301_9@(@H_301_9@1@H_301_9@)基于命令行
@H_301_9@crm shell/pcs
@H_301_9@(@H_301_9@2@H_301_9@)基于图形化
@H_301_9@pygui/hawk/lcmc/pcs
@H_301_9@Pacemaker@H_301_9@内部组件、模块关系图:
@H_301_9@
二、环境
2.1 OS
#cat/etc/issue CentOSrelease6.4(Final) Kernel\ronan\m #uname-a Linuxnode12.6.32-358.el6.x86_64#1SMPFriFeb2200:31:26UTC2013x86_64x86_64x86_64GNU/Linux
2.2 IP
@H_301_9@
node1:
@H_301_9@eth0 192.168.100.201/24 GW 192.168.100.1 ---@H_301_9@真实地址
@H_301_9@eth1 10.10.10.1/24 ---@H_301_9@心跳地址
@H_301_9@eth2 192.168.1.1/24 ---@H_301_9@流复制地址
node2:
@H_301_9@eth0 192.168.100.202/24 GW 192.168.100.1 ---@H_301_9@真实地址
@H_301_9@eth1 10.10.10.2/24 ---@H_301_9@心跳地址
@H_301_9@eth2 192.168.1.2/24 ---@H_301_9@流复制地址
虚拟地址:
@H_301_9@eth0:0 192.168.100.213/24 ---vip-master
@H_301_9@eth0:0 192.168.100.214/24 ---vip-slave
@H_301_9@eth2:0 192.168.1.3/24 ---vip-rep
2.3软件版本
#rpm-qa|grepcorosync corosync-1.4.5-2.3.x86_64 corosynclib-1.4.5-2.3.x86_64 #rpm-qa|greppacemaker pacemaker-libs-1.1.10-14.el6_5.2.x86_64 pacemaker-cli-1.1.10-14.el6_5.2.x86_64 pacemaker-1.1.10-14.el6_5.2.x86_64 pacemaker-cluster-libs-1.1.10-14.el6_5.2.x86_64 #rpm-qa|grepcrmsh crmsh-1.2.6-6.1.x86_64
@H_301_9@
Postgresql Version:@H_301_9@9.@H_301_9@1@H_301_9@.@H_301_9@4
三、安装
3.1设置YUM源
#cat/etc/yum.repos.d/ha-clustering.repo [haclustering] name=HAClustering baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ enabled=1 gpgcheck=0
@H_301_9@
3.2安装pacemaker/corosync/crmsh
#yuminstallpacemakercorosynccrmsh
@H_301_9@安装后会在@H_301_9@/usr/lib/ocf/resource.d@H_301_9@下生成相应的@H_301_9@ocf@H_301_9@资源脚本,如下:
#cd/usr/lib/ocf/resource.d/ [root@node1resource.d]#ls heartbeatpacemakerredhat
@H_301_9@通过命令查看资源脚本:
[root@node1resource.d]#crmralistocf ASEHAagent.shAoEtargetAudibleAlarmCTDBClusterMonDelayDummy EvmsSCCEvmsdFilesystemHealthcpuHealthSMARTICPIPaddr IPaddr2IPsrcaddrIPv6addrLVMLinuxSCSIMailToManageRAID ManageVEPure-FTPdRaid1RouteSAPDatabaseSAPInstanceSendArp ServeRAIDSphinxSearchDaemonSquidStatefulSysInfoSystemHealthVIPArip VirtualDomainWASWAS6WinPopupXenXinetdanything apacheapache.shasteriskclusterfs.shconntrackdcontrolddb2 dhcpddrbddrbd.sheDir88ethmonitorexportfsfio fs.shiSCSILogicalUnitiSCSITargetidsip.shiscsijboss ldirectordlvm.shlvm_by_lv.shlvm_by_vg.shlxcMysqLMysqL-proxy MysqL.shnamednamed.shnetfs.shnfsclient.shnfsexport.shnfsserver nfsserver.shNginxocf-shellfuncsopenldap.shoracleoracledb.shorainstance.sh oralistener.shoralsnrpgsqlpingpingdportblockpostfix postgres-8.shpoundproftpdremotersyncdrsyslogsamba.sh script.shscsi2reservationservice.shsfexslapdsmb.shsvclib_nfslock symlinksyslog-ngtomcattomcat-5.shtomcat-6.shvarnishvm.sh vmwarezabbixserver
@H_301_9@启动@H_301_9@corosync@H_301_9@:
[root@node1~]#servicecorosyncstart StartingCorosyncClusterEngine(corosync):[OK] [root@node2~]#servicecorosyncstart StartingCorosyncClusterEngine(corosync):[OK] [root@node2~]#crmstatus Lastupdated:SatJan1807:00:342014 Lastchange:SatJan1806:58:112014viacrmdonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,2expectedvotes 0Resourcesconfigured Online:[node1node2]
@H_301_9@若出现以下错误可先禁止掉@H_301_9@stonith@H_301_9@,该错误是因为@H_301_9@stonith@H_301_9@未配置导致,错误如下:
@H_301_9@crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
@H_301_9@crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
@H_301_9@crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
@H_301_9@禁止@H_301_9@stonith@H_301_9@(只在一个节点上执行即可):
[root@node1~]#crmconfigurepropertystonith-enabled=false
3.3安装Postgresql
@H_301_9@安装目录为@H_301_9@/opt/pgsql
@H_301_9@{@H_301_9@安装过程略@H_301_9@}
@H_301_9@为@H_301_9@postgres@H_301_9@用户配置环境变量:
[postgres@node1~]$cat.bash_profile #.bash_profile #Getthealiasesandfunctions if[-f~/.bashrc];then .~/.bashrc fi #Userspecificenvironmentandstartupprograms exportPATH=/opt/pgsql/bin:$PATH:$HOME/bin exportPGDATA=/opt/pgsql/data exportPGUSER=postgres exportPGPORT=5432 exportLD_LIBRARY_PATH=/opt/pgsql/lib:$LD_LIBRARY_PATH
@H_301_9@
四、配置
4.1 hosts设置
#vim/etc/hosts 192.168.100.201node1 192.168.100.202node2
4.2配置corosync
[root@node1~]#cd/etc/corosync/ [root@node1corosync]#ls corosync.conf.examplecorosync.conf.example.udpuservice.duidgid.d [root@node1corosync]#cpcorosync.conf.examplecorosync.conf [root@node1corosync]#vimcorosync.conf compatibility:whitetank//兼容旧版本 totem{//节点间通信协议定义 version:2 secauth:on//是否开启安全认证 threads:0 interface{//心跳配置 ringnumber:0 bindnetaddr:10.10.10.0//绑定网络 mcastaddr:226.94.1.1//向外发送多播的地址 mcastport:5405//多播端口 ttl:1 } } logging{//日志设置 fileline:off to_stderr:no//是否发送错误信息到标准输出 to_logfile:yes//是否记录到日志文件 to_syslog:yes//是否记录到系统日志 logfile:/var/log/cluster/corosync.log//日志文件,注意/var/log/cluster目录必须存在 debug:off timestamp:on//日志中是否标记时间 logger_subsys{ subsys:AMF debug:off } } amf{ mode:disabled } service{ ver:0 name:pacemaker//启用pacemaker } aisexec{ user:root group:root }
@H_301_9@
4.3生成密钥
@H_301_9@{@H_301_9@默认利用@H_301_9@random@H_301_9@生成,但如果中断的系统随机数不够用就需要较长的时间,此时可以通过@H_301_9@urandom@H_301_9@来替代@H_301_9@random}
[root@node1corosync]#mv/dev/random/dev/random.bak [root@node1corosync]#ln-s/dev/urandom/dev/random [root@node1corosync]#corosync-keygen CorosyncClusterEngineAuthenticationkeygenerator. Gathering1024bitsforkeyfrom/dev/random. Presskeysonyourkeyboardtogenerateentropy. Writingcorosynckeyto/etc/corosync/authkey.
4.4 SSH互信配置
node1 -> node2 :
[root@node1~]#cd.ssh/ [root@node1.ssh]#ssh-keygen-trsa Generatingpublic/privatersakeypair. Enterfileinwhichtosavethekey(/root/.ssh/id_rsa): Enterpassphrase(emptyfornopassphrase): Entersamepassphraseagain: Youridentificationhasbeensavedin/root/.ssh/id_rsa. Yourpublickeyhasbeensavedin/root/.ssh/id_rsa.pub. Thekeyfingerprintis: 2c:ed:1e:a6:a7:cd:e3:b2:7c:de:aa:ff:63:28:9a:19root@node1 Thekey'srandomartimageis: +--[RSA2048]----+ || || || |o| |.S| |o| |E+.| |=o*=oo| |+.*%O=o.| +-----------------+ [root@node1.ssh]#ssh-copy-id-iid_rsa.pubnode2 Theauthenticityofhost'node2(192.168.100.202)'can'tbeestablished. RSAkeyfingerprintisbe:76:cd:29:af:59:76:11:6a:c7:7d:72:27:df:d1:02. Areyousureyouwanttocontinueconnecting(yes/no)?yes Warning:Permanentlyadded'node2,192.168.100.202'(RSA)tothelistofknownhosts. root@node2'spassword: Nowtryloggingintothemachine,with"ssh'node2'",andcheckin: .ssh/authorized_keys tomakesurewehaven'taddedextrakeysthatyouweren'texpecting. [root@node1.ssh]#sshnode2date SatJan1806:36:21CST2014
node2 -> node1 :
[root@node2~]#cd.ssh/ [root@node2.ssh]#ssh-keygen-trsa [root@node2.ssh]#ssh-copy-id-iid_rsa.pubnode1 [root@node2.ssh]#sshnode1date SatJan1806:37:31CST2014
4.5同步配置
[root@node1corosync]#scpauthkeycorosync.confnode2:/etc/corosync/ authkey100%1280.1KB/s00:00 corosync.conf100%28082.7KB/s00:00
4.6下载替换脚本
@H_301_9@虽然安装了上述软件后会生成@H_301_9@pgsql@H_301_9@资源脚本,但是其版本过旧,且自带的@H_301_9@pgsql@H_301_9@不能实现自动切换功能,所以在安装了@H_301_9@pacemaker/corosync@H_301_9@之后需要从网上下载进行替换,如下:
https://github.com/ClusterLabs/resource-agents/tree/master/heartbeat@H_301_9@
@H_301_9@下载@H_301_9@pgsql@H_301_9@与@H_301_9@ocf-shellfuncs.in
@H_301_9@替换:
#cppgsql/usr/lib/ocf/resource.d/heartbeat/ #cpocf-shellfuncs.in/usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
@H_301_9@{@H_301_9@注意要将@H_301_9@ocf-shellfuncs.in@H_301_9@名称改为@H_301_9@ocf-shellfuncs@H_301_9@,否则@H_301_9@pgsql@H_301_9@可能会找不到要用的函数。新下载的函数定义文件中添加了一些新功能函数,如@H_301_9@ocf_local_nodename@H_301_9@等@H_301_9@}
@H_301_9@
@H_301_9@pgsql@H_301_9@资源脚本特性:
●@H_301_9@主节点失效切换
@H_301_9@master@H_301_9@宕掉时,@H_301_9@RA@H_301_9@检测到该问题并将@H_301_9@master@H_301_9@标记为@H_301_9@stop@H_301_9@,随后将@H_301_9@slave@H_301_9@提升为新的@H_301_9@master@H_301_9@。
●@H_301_9@异步与同步切换
@H_301_9@如果@H_301_9@slave@H_301_9@宕掉或者@H_301_9@LAN@H_301_9@中存在问题,那么当设置为同步复制时包含写操作的事务将会被终止,也就意味着服务将停止。因此,为防止服务停止@H_301_9@RA@H_301_9@将会动态地将同步转换为异步复制。
@H_301_9@当两个或多个节点上的@H_301_9@Pacemaker@H_301_9@同时初始启动时,@H_301_9@RA@H_301_9@通过每个节点上最近的@H_301_9@replay location@H_301_9@进行比较,找出最新数据节点。这个拥有最新数据的节点将被认为是@H_301_9@master@H_301_9@。当然,若在一个节点上启动@H_301_9@pacemaker@H_301_9@或者该节点上的@H_301_9@pacemaker@H_301_9@是第一个被启动的,那么它也将成为@H_301_9@master@H_301_9@。@H_301_9@RA@H_301_9@依据停止前的数据状态进行裁定。
@H_301_9@由于@H_301_9@slave@H_301_9@节点可以处理只读事务,因此对于读操作可以通过虚拟另一个虚拟@H_301_9@IP@H_301_9@来实现读操作的负载均衡。
4.7启动corosync
@H_301_9@启动:
[root@node1~]#servicecorosyncstart [root@node2~]#servicecorosyncstart
@H_301_9@检测状态:
[root@node1~]#crmstatus Lastupdated:TueJan2123:55:132014 Lastchange:TueJan2123:37:362014viacrm_attributeonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,'宋体';font-size:13px;white-space:normal;">@H_301_9@{corosync@H_301_9@启动成功@H_301_9@}4.8配置流复制
@H_301_9@在@H_301_9@node1/node2@H_301_9@上配置@H_301_9@postgresql.conf/pg_hba.conf@H_301_9@:
@H_301_9@listen_addresses = '*'
@H_301_9@port = 5432
@H_301_9@wal_level = hot_standby
@H_301_9@archive_mode = on
@H_301_9@archive_command = 'test ! -f /opt/archivelog/%f && cp %p /opt/archivelog/%f'
@H_301_9@max_wal_senders = 4
@H_301_9@wal_keep_segments = 50
@H_301_9@hot_standby = on
@H_301_9@pg_hba.conf :
@H_301_9@host replication postgres 192.168.1.0/24 trust
@H_301_9@在@H_301_9@node2@H_301_9@上执行基础同步:
[postgres@node2data]$pg_basebackup-h192.168.1.1-Upostgres-D/opt/pgsql/data-P@H_301_9@若需测试流复制是否能够成功,可在此处手工配置(@H_301_9@corosync@H_301_9@启动数据库时自动生成,若已经存在将会被覆盖)@H_301_9@recovery.conf@H_301_9@进行测试:
@H_301_9@standby_mode = 'on'
@H_301_9@primary_conninfo = 'host=192.168.1.1 port=5432 user=postgres application_name=node2 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
@H_301_9@restore_command = 'cp /opt/archivelog/%f %p'
@H_301_9@recovery_target_timeline = 'latest'
@H_301_9@
[postgres@node2data]$pg_ctlstart [postgres@node1pgsql]$psql postgres=#selectclient_addr,sync_statefrompg_stat_replication; client_addr|sync_state -------------+------------ 192.168.1.2|sync (1row)[postgres@node2~]$pg_ctlstop-mf [postgres@node1~]$pg_ctlstop-mf4.9配置pacemaker
@H_301_9@{@H_301_9@关于@H_301_9@pacemaker@H_301_9@的配置可通过多种方式,如@H_301_9@crmsh@H_301_9@、@H_301_9@hb_gui@H_301_9@、@H_301_9@pcs@H_301_9@等,该实验使用@H_301_9@crmsh@H_301_9@配置@H_301_9@}
@H_301_9@编写@H_301_9@crm@H_301_9@配置脚本:
[root@node1~]#catpgsql.crm property\//设置全局属性 no-quorum-policy="ignore"\//关闭法定投票人数策略,多节点时启用 stonith-enabled="false"\//禁用stonith设备检测 crmd-transition-delay="0s" rsc_defaults\//资源默认属性配置 resource-stickiness="INFINITY"\//资源留在所处位置的自愿程度,INFINITY为无限自愿 migration-threshold="1"//设置资源发生多少次故障时节点将失去管理该资源的资格 msmsPostgresqlpgsql\// Meta\ master-max="1"\ master-node-max="1"\ clone-max="2"\ clone-node-max="1"\ notify="true" cloneclnPingCheckpingCheck//克隆资源 groupmaster-group\//定义资源组 vip-master\ vip-rep primitivevip-masterocf:heartbeat:IPaddr2\//定义vip-master资源 params\ ip="192.168.100.213"\ nic="eth0"\ cidr_netmask="24"\ opstarttimeout="60s"interval="0s"on-fail="stop"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivevip-repocf:heartbeat:IPaddr2\//定义vip-rep资源 params\ ip="192.168.1.3"\ nic="eth2"\ cidr_netmask="24"\ Meta\ migration-threshold="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivevip-slaveocf:heartbeat:IPaddr2\//定义vip-slave资源 params\ ip="192.168.100.214"\ nic="eth0"\ cidr_netmask="24"\ Meta\ resource-stickiness="1"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivepgsqlocf:heartbeat:pgsql\//定义pgsql资源 params\//设置相关参数 pgctl="/opt/pgsql/bin/pg_ctl"\ psql="/opt/pgsql/bin/psql"\ pgdata="/opt/pgsql/data/"\ start_opt="-p5432"\ rep_mode="sync"\ node_list="node1node2"\ restore_command="cp/opt/archivelog/%f%p"\ primary_conninfo_opt="keepalives_idle=60keepalives_interval=5keepalives_count=5"\ master_ip="192.168.1.3"\ stop_escalate="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="7s"on-fail="restart"\ opmonitortimeout="60s"interval="2s"on-fail="restart"role="Master"\ oppromotetimeout="60s"interval="0s"on-fail="restart"\ opdemotetimeout="60s"interval="0s"on-fail="stop"\ opstoptimeout="60s"interval="0s"on-fail="block"\ opnotifytimeout="60s"interval="0s" primitivepingCheckocf:pacemaker:ping\//定义pingCheck资源 params\ name="default_ping_set"\ host_list="192.168.100.1"\ multiplier="100"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="ignore" locationrsc_location-1vip-slave\//定义资源vip-slave选择位置 rule200:pgsql-statuseq"HS:sync"\ rule100:pgsql-statuseq"PRI"\ rule-inf:not_definedpgsql-status\ rule-inf:pgsql-statusne"HS:sync"andpgsql-statusne"PRI" locationrsc_location-2msPostgresql\//定义资源msPostgresql选择位置 rule-inf:not_defineddefault_ping_setordefault_ping_setlt100 colocationrsc_colocation-1inf:msPostgresqlclnPingCheck//定义在相同节点上运行的资源 colocationrsc_colocation-2inf:master-groupmsPostgresql:Master orderrsc_order-10:clnPingCheckmsPostgresql//定义对资源的操作顺序 orderrsc_order-20:msPostgresql:promotemaster-group:startsymmetrical=false orderrsc_order-30:msPostgresql:demotemaster-group:stopsymmetrical=false@H_301_9@注:该脚本针对网上的配置方式做了一点修改,因为网上是针对@H_301_9@pacemaker-1.0.*@H_301_9@进行配置的,而本实验使用的是@H_301_9@pacemaker-1.1.10@H_301_9@。
@H_301_9@@H_301_9@导入配置脚本:
[root@node1~]#crmconfigureloadupdatepgsql.crm WARNING:pgsql:specifiedtimeout60sforstopissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforstartissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sfornotifyissmallerthantheadvised90 WARNING:pgsql:specifiedtimeout60sfordemoteissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforpromoteissmallerthantheadvised120@H_301_9@一段时间后查看@H_301_9@ha@H_301_9@状态:
sql[pgsql] Masters:[node1] Slaves:[node2] CloneSet:clnPingCheck[pingCheck] Started:[node1node2] [root@node1~]#crm_mon-Afr-1 Lastupdated:TueJan2123:37:202014 Lastchange:TueJan2123:37:362014viacrm_attributeonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,2expectedvotes 7Resourcesconfigured Online:[node1node2] Fulllistofresources: vip-slave(ocf::heartbeat:IPaddr2):Startednode2 ResourceGroup:master-group vip-master(ocf::heartbeat:IPaddr2):Startednode1 vip-rep(ocf::heartbeat:IPaddr2):Startednode1 Master/SlaveSet:msPostgresql[pgsql] Masters:[node1] Slaves:[node2] CloneSet:clnPingCheck[pingCheck] Started:[node1node2] NodeAttributes: *Nodenode1: +default_ping_set:100 +master-pgsql:1000 +pgsql-data-status:LATEST +pgsql-master-baseline:0000000006000078 +pgsql-status:PRI *Nodenode2: +default_ping_set:100 +master-pgsql:100 +pgsql-data-status:STREAMING|SYNC +pgsql-status:HS:sync Migrationsummary: *Nodenode2: *Nodenode1:@H_301_9@@H_301_9@注:刚启动时两节点均为@H_301_9@slave@H_301_9@,一段时间后@H_301_9@node1@H_301_9@自动切换为@H_301_9@master@H_301_9@。