corosync + pacemaker + postgres_streaming_replication
说明:
该文档用于说明以corosync+pacemaker的方式实现Postgresql流复制自动切换。注意内容包括有关corosync/pacemaker知识总结以及整个环境的搭建过程和问题处理。
一、介绍
Corosync
Corosync是由OpenAIS项目分离独立出来的项目,分离后能实现HA信息传输功能的就成为了Corosync,因此Corosync 60%的代码来源于OpenAIS。
Corosync与分离出来的Heartbeat类似,都属于集群信息层,负责传送集群信息以及节点间的心跳信息,单纯HA软件都不存在管理资源的功能,而是需要依赖上层的CRM来管理资源。目前最著名的资源管理器为Pacemaker,Corosync+Pacemaker也成为高可用集群方案中的最佳组合。
Pacemaker
Pacemaker,即Cluster Resource Manager(CRM),管理整个HA,客户端通过pacemaker管理监控整个集群。
常用的集群管理工具:
(1)基于命令行
crm shell/pcs
(2)基于图形化
pygui/hawk/lcmc/pcs
Pacemaker内部组件、模块关系图:
二、环境
2.1 OS
#cat/etc/issue CentOSrelease6.4(Final) Kernel\ronan\m #uname-a Linuxnode12.6.32-358.el6.x86_64#1SMPFriFeb2200:31:26UTC2013x86_64x86_64x86_64GNU/Linux
2.2 IP
node1:
eth0 192.168.100.201/24 GW 192.168.100.1 ---真实地址
eth1 10.10.10.1/24 ---心跳地址
eth2 192.168.1.1/24 ---流复制地址
node2:
eth0 192.168.100.202/24 GW 192.168.100.1 ---真实地址
eth1 10.10.10.2/24 ---心跳地址
eth2 192.168.1.2/24 ---流复制地址
虚拟地址:
eth0:0 192.168.100.213/24 ---vip-master
eth0:0 192.168.100.214/24 ---vip-slave
eth2:0 192.168.1.3/24 ---vip-rep
2.3软件版本
#rpm-qa|grepcorosync corosync-1.4.5-2.3.x86_64 corosynclib-1.4.5-2.3.x86_64 #rpm-qa|greppacemaker pacemaker-libs-1.1.10-14.el6_5.2.x86_64 pacemaker-cli-1.1.10-14.el6_5.2.x86_64 pacemaker-1.1.10-14.el6_5.2.x86_64 pacemaker-cluster-libs-1.1.10-14.el6_5.2.x86_64 #rpm-qa|grepcrmsh crmsh-1.2.6-6.1.x86_64
Postgresql Version:9.1.4
三、安装
3.1设置YUM源
#cat/etc/yum.repos.d/ha-clustering.repo [haclustering] name=HAClustering baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ enabled=1 gpgcheck=0
3.2安装pacemaker/corosync/crmsh
#yuminstallpacemakercorosynccrmsh
安装后会在/usr/lib/ocf/resource.d下生成相应的ocf资源脚本,如下:
#cd/usr/lib/ocf/resource.d/ [root@node1resource.d]#ls heartbeatpacemakerredhat
通过命令查看资源脚本:
[root@node1resource.d]#crmralistocf ASEHAagent.shAoEtargetAudibleAlarmCTDBClusterMonDelayDummy EvmsSCCEvmsdFilesystemHealthcpuHealthSMARTICPIPaddr IPaddr2IPsrcaddrIPv6addrLVMLinuxSCSIMailToManageRAID ManageVEPure-FTPdRaid1RouteSAPDatabaseSAPInstanceSendArp ServeRAIDSphinxSearchDaemonSquidStatefulSysInfoSystemHealthVIPArip VirtualDomainWASWAS6WinPopupXenXinetdanything apacheapache.shasteriskclusterfs.shconntrackdcontrolddb2 dhcpddrbddrbd.sheDir88ethmonitorexportfsfio fs.shiSCSILogicalUnitiSCSITargetidsip.shiscsijboss ldirectordlvm.shlvm_by_lv.shlvm_by_vg.shlxcMysqLMysqL-proxy MysqL.shnamednamed.shnetfs.shnfsclient.shnfsexport.shnfsserver nfsserver.shNginxocf-shellfuncsopenldap.shoracleoracledb.shorainstance.sh oralistener.shoralsnrpgsqlpingpingdportblockpostfix postgres-8.shpoundproftpdremotersyncdrsyslogsamba.sh script.shscsi2reservationservice.shsfexslapdsmb.shsvclib_nfslock symlinksyslog-ngtomcattomcat-5.shtomcat-6.shvarnishvm.sh vmwarezabbixserver
启动corosync:
[root@node1~]#servicecorosyncstart StartingCorosyncClusterEngine(corosync):[OK] [root@node2~]#servicecorosyncstart StartingCorosyncClusterEngine(corosync):[OK] [root@node2~]#crmstatus Lastupdated:SatJan1807:00:342014 Lastchange:SatJan1806:58:112014viacrmdonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,2expectedvotes 0Resourcesconfigured Online:[node1node2]
若出现以下错误可先禁止掉stonith,该错误是因为stonith未配置导致,错误如下:
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
禁止stonith(只在一个节点上执行即可):
[root@node1~]#crmconfigurepropertystonith-enabled=false
3.3安装Postgresql
安装目录为/opt/pgsql
{安装过程略}
为postgres用户配置环境变量:
[postgres@node1~]$cat.bash_profile #.bash_profile #Getthealiasesandfunctions if[-f~/.bashrc];then .~/.bashrc fi #Userspecificenvironmentandstartupprograms exportPATH=/opt/pgsql/bin:$PATH:$HOME/bin exportPGDATA=/opt/pgsql/data exportPGUSER=postgres exportPGPORT=5432 exportLD_LIBRARY_PATH=/opt/pgsql/lib:$LD_LIBRARY_PATH
四、配置
4.1 hosts设置
#vim/etc/hosts 192.168.100.201node1 192.168.100.202node2
4.2配置corosync
[root@node1~]#cd/etc/corosync/ [root@node1corosync]#ls corosync.conf.examplecorosync.conf.example.udpuservice.duidgid.d [root@node1corosync]#cpcorosync.conf.examplecorosync.conf [root@node1corosync]#vimcorosync.conf compatibility:whitetank//兼容旧版本 totem{//节点间通信协议定义 version:2 secauth:on//是否开启安全认证 threads:0 interface{//心跳配置 ringnumber:0 bindnetaddr:10.10.10.0//绑定网络 mcastaddr:226.94.1.1//向外发送多播的地址 mcastport:5405//多播端口 ttl:1 } } logging{//日志设置 fileline:off to_stderr:no//是否发送错误信息到标准输出 to_logfile:yes//是否记录到日志文件 to_syslog:yes//是否记录到系统日志 logfile:/var/log/cluster/corosync.log//日志文件,注意/var/log/cluster目录必须存在 debug:off timestamp:on//日志中是否标记时间 logger_subsys{ subsys:AMF debug:off } } amf{ mode:disabled } service{ ver:0 name:pacemaker//启用pacemaker } aisexec{ user:root group:root }
4.3生成密钥
{默认利用random生成,但如果中断的系统随机数不够用就需要较长的时间,此时可以通过urandom来替代random}
[root@node1corosync]#mv/dev/random/dev/random.bak [root@node1corosync]#ln-s/dev/urandom/dev/random [root@node1corosync]#corosync-keygen CorosyncClusterEngineAuthenticationkeygenerator. Gathering1024bitsforkeyfrom/dev/random. Presskeysonyourkeyboardtogenerateentropy. Writingcorosynckeyto/etc/corosync/authkey.
4.4 SSH互信配置
node1 -> node2 :
[root@node1~]#cd.ssh/ [root@node1.ssh]#ssh-keygen-trsa Generatingpublic/privatersakeypair. Enterfileinwhichtosavethekey(/root/.ssh/id_rsa): Enterpassphrase(emptyfornopassphrase): Entersamepassphraseagain: Youridentificationhasbeensavedin/root/.ssh/id_rsa. Yourpublickeyhasbeensavedin/root/.ssh/id_rsa.pub. Thekeyfingerprintis: 2c:ed:1e:a6:a7:cd:e3:b2:7c:de:aa:ff:63:28:9a:19root@node1 Thekey'srandomartimageis: +--[RSA2048]----+ || || || |o| |.S| |o| |E+.| |=o*=oo| |+.*%O=o.| +-----------------+ [root@node1.ssh]#ssh-copy-id-iid_rsa.pubnode2 Theauthenticityofhost'node2(192.168.100.202)'can'tbeestablished. RSAkeyfingerprintisbe:76:cd:29:af:59:76:11:6a:c7:7d:72:27:df:d1:02. Areyousureyouwanttocontinueconnecting(yes/no)?yes Warning:Permanentlyadded'node2,192.168.100.202'(RSA)tothelistofknownhosts. root@node2'spassword: Nowtryloggingintothemachine,with"ssh'node2'",andcheckin: .ssh/authorized_keys tomakesurewehaven'taddedextrakeysthatyouweren'texpecting. [root@node1.ssh]#sshnode2date SatJan1806:36:21CST2014
node2 -> node1 :
[root@node2~]#cd.ssh/ [root@node2.ssh]#ssh-keygen-trsa [root@node2.ssh]#ssh-copy-id-iid_rsa.pubnode1 [root@node2.ssh]#sshnode1date SatJan1806:37:31CST2014
4.5同步配置
[root@node1corosync]#scpauthkeycorosync.confnode2:/etc/corosync/ authkey100%1280.1KB/s00:00 corosync.conf100%28082.7KB/s00:00
4.6下载替换脚本
虽然安装了上述软件后会生成pgsql资源脚本,但是其版本过旧,且自带的pgsql不能实现自动切换功能,所以在安装了pacemaker/corosync之后需要从网上下载进行替换,如下:
https://github.com/ClusterLabs/resource-agents/tree/master/heartbeat
下载pgsql与ocf-shellfuncs.in
替换:
#cppgsql/usr/lib/ocf/resource.d/heartbeat/ #cpocf-shellfuncs.in/usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
{注意要将ocf-shellfuncs.in名称改为ocf-shellfuncs,否则pgsql可能会找不到要用的函数。新下载的函数定义文件中添加了一些新功能函数,如ocf_local_nodename等}
pgsql资源脚本特性:
●主节点失效切换
master宕掉时,RA检测到该问题并将master标记为stop,随后将slave提升为新的master。
●异步与同步切换
如果slave宕掉或者LAN中存在问题,那么当设置为同步复制时包含写操作的事务将会被终止,也就意味着服务将停止。因此,为防止服务停止RA将会动态地将同步转换为异步复制。
●初始启动时自动识别新旧数据
当两个或多个节点上的Pacemaker同时初始启动时,RA通过每个节点上最近的replay location进行比较,找出最新数据节点。这个拥有最新数据的节点将被认为是master。当然,若在一个节点上启动pacemaker或者该节点上的pacemaker是第一个被启动的,那么它也将成为master。RA依据停止前的数据状态进行裁定。
●读负载均衡
由于slave节点可以处理只读事务,因此对于读操作可以通过虚拟另一个虚拟IP来实现读操作的负载均衡。
4.7启动corosync
启动:
[root@node1~]#servicecorosyncstart [root@node2~]#servicecorosyncstart
检测状态:
[root@node1~]#crmstatus Lastupdated:TueJan2123:55:132014 Lastchange:TueJan2123:37:362014viacrm_attributeonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,'宋体';font-size:13px;white-space:normal;">{corosync启动成功}4.8配置流复制
在node1/node2上配置postgresql.conf/pg_hba.conf:
postgresql.conf :
listen_addresses = '*'
port = 5432
wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /opt/archivelog/%f && cp %p /opt/archivelog/%f'
max_wal_senders = 4
wal_keep_segments = 50
hot_standby = on
pg_hba.conf :
host replication postgres 192.168.1.0/24 trust
在node2上执行基础同步:
[postgres@node2data]$pg_basebackup-h192.168.1.1-Upostgres-D/opt/pgsql/data-P若需测试流复制是否能够成功,可在此处手工配置(corosync启动数据库时自动生成,若已经存在将会被覆盖)recovery.conf进行测试:
standby_mode = 'on'
primary_conninfo = 'host=192.168.1.1 port=5432 user=postgres application_name=node2 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /opt/archivelog/%f %p'
recovery_target_timeline = 'latest'
[postgres@node2data]$pg_ctlstart [postgres@node1pgsql]$psql postgres=#selectclient_addr,sync_statefrompg_stat_replication; client_addr|sync_state -------------+------------ 192.168.1.2|sync (1row)停止数据库:
[postgres@node2~]$pg_ctlstop-mf [postgres@node1~]$pg_ctlstop-mf4.9配置pacemaker
{关于pacemaker的配置可通过多种方式,如crmsh、hb_gui、pcs等,该实验使用crmsh配置}
编写crm配置脚本:
[root@node1~]#catpgsql.crm property\//设置全局属性 no-quorum-policy="ignore"\//关闭法定投票人数策略,多节点时启用 stonith-enabled="false"\//禁用stonith设备检测 crmd-transition-delay="0s" rsc_defaults\//资源默认属性配置 resource-stickiness="INFINITY"\//资源留在所处位置的自愿程度,INFINITY为无限自愿 migration-threshold="1"//设置资源发生多少次故障时节点将失去管理该资源的资格 msmsPostgresqlpgsql\// Meta\ master-max="1"\ master-node-max="1"\ clone-max="2"\ clone-node-max="1"\ notify="true" cloneclnPingCheckpingCheck//克隆资源 groupmaster-group\//定义资源组 vip-master\ vip-rep primitivevip-masterocf:heartbeat:IPaddr2\//定义vip-master资源 params\ ip="192.168.100.213"\ nic="eth0"\ cidr_netmask="24"\ opstarttimeout="60s"interval="0s"on-fail="stop"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivevip-repocf:heartbeat:IPaddr2\//定义vip-rep资源 params\ ip="192.168.1.3"\ nic="eth2"\ cidr_netmask="24"\ Meta\ migration-threshold="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivevip-slaveocf:heartbeat:IPaddr2\//定义vip-slave资源 params\ ip="192.168.100.214"\ nic="eth0"\ cidr_netmask="24"\ Meta\ resource-stickiness="1"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivepgsqlocf:heartbeat:pgsql\//定义pgsql资源 params\//设置相关参数 pgctl="/opt/pgsql/bin/pg_ctl"\ psql="/opt/pgsql/bin/psql"\ pgdata="/opt/pgsql/data/"\ start_opt="-p5432"\ rep_mode="sync"\ node_list="node1node2"\ restore_command="cp/opt/archivelog/%f%p"\ primary_conninfo_opt="keepalives_idle=60keepalives_interval=5keepalives_count=5"\ master_ip="192.168.1.3"\ stop_escalate="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="7s"on-fail="restart"\ opmonitortimeout="60s"interval="2s"on-fail="restart"role="Master"\ oppromotetimeout="60s"interval="0s"on-fail="restart"\ opdemotetimeout="60s"interval="0s"on-fail="stop"\ opstoptimeout="60s"interval="0s"on-fail="block"\ opnotifytimeout="60s"interval="0s" primitivepingCheckocf:pacemaker:ping\//定义pingCheck资源 params\ name="default_ping_set"\ host_list="192.168.100.1"\ multiplier="100"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="ignore" locationrsc_location-1vip-slave\//定义资源vip-slave选择位置 rule200:pgsql-statuseq"HS:sync"\ rule100:pgsql-statuseq"PRI"\ rule-inf:not_definedpgsql-status\ rule-inf:pgsql-statusne"HS:sync"andpgsql-statusne"PRI" locationrsc_location-2msPostgresql\//定义资源msPostgresql选择位置 rule-inf:not_defineddefault_ping_setordefault_ping_setlt100 colocationrsc_colocation-1inf:msPostgresqlclnPingCheck//定义在相同节点上运行的资源 colocationrsc_colocation-2inf:master-groupmsPostgresql:Master orderrsc_order-10:clnPingCheckmsPostgresql//定义对资源的操作顺序 orderrsc_order-20:msPostgresql:promotemaster-group:startsymmetrical=false orderrsc_order-30:msPostgresql:demotemaster-group:stopsymmetrical=false注:该脚本针对网上的配置方式做了一点修改,因为网上是针对pacemaker-1.0.*进行配置的,而本实验使用的是pacemaker-1.1.10。
导入配置脚本:
[root@node1~]#crmconfigureloadupdatepgsql.crm WARNING:pgsql:specifiedtimeout60sforstopissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforstartissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sfornotifyissmallerthantheadvised90 WARNING:pgsql:specifiedtimeout60sfordemoteissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforpromoteissmallerthantheadvised120一段时间后查看ha状态:
sql[pgsql] Masters:[node1] Slaves:[node2] CloneSet:clnPingCheck[pingCheck] Started:[node1node2] [root@node1~]#crm_mon-Afr-1 Lastupdated:TueJan2123:37:202014 Lastchange:TueJan2123:37:362014viacrm_attributeonnode1 Stack:classicopenais(withplugin) CurrentDC:node1-partitionwithquorum Version:1.1.10-14.el6_5.2-368c726 2Nodesconfigured,2expectedvotes 7Resourcesconfigured Online:[node1node2] Fulllistofresources: vip-slave(ocf::heartbeat:IPaddr2):Startednode2 ResourceGroup:master-group vip-master(ocf::heartbeat:IPaddr2):Startednode1 vip-rep(ocf::heartbeat:IPaddr2):Startednode1 Master/SlaveSet:msPostgresql[pgsql] Masters:[node1] Slaves:[node2] CloneSet:clnPingCheck[pingCheck] Started:[node1node2] NodeAttributes: *Nodenode1: +default_ping_set:100 +master-pgsql:1000 +pgsql-data-status:LATEST +pgsql-master-baseline:0000000006000078 +pgsql-status:PRI *Nodenode2: +default_ping_set:100 +master-pgsql:100 +pgsql-data-status:STREAMING|SYNC +pgsql-status:HS:sync Migrationsummary: *Nodenode2: *Nodenode1:注:刚启动时两节点均为slave,一段时间后node1自动切换为master。