一个朋友说参考了网址:http://www.mkyong.com/database/postgresql-point-in-time-recovery-incremental-backup/后做了一个基于时间点的数据库恢复操作,但是失败了。其过程大致如下:
测试环境:vmware 8
os :centos 5.7 (final)
PG: version 9.1.2
####以下是他的测试过程记录
postgres=# create table testPITR1 as select * from pg_class,pg_description; SELECT 936936 postgres=# select * from current_timestamp(0); timestamptz ------------------------ 2012-07-02 01:53:16-07 (1 row) postgres=# select pg_start_backup('full_backup-testing_20120702'); pg_start_backup ----------------- 0/60000020 (1 row) postgres=# select pg_current_xlog_location(); pg_current_xlog_location -------------------------- 0/600000B0 (1 row)
--打包数据文件 tar pgdata.tar ./pgdata
postgres=# select pg_stop_backup(); NOTICE: pg_stop_backup complete,all required WAL segments have been archived pg_stop_backup ---------------- 0/60000168 (1 row) postgres=# create table testPITR2 as select * from pg_class,pg_description; SELECT 946764 postgres=# select * from current_timestamp(0); timestamptz ------------------------ 2012-07-02 02:05:20-07 (1 row) postgres=# create table testPITR3 as select * from pg_class,pg_description; SELECT 956592 postgres=# select * from current_timestamp(0); timestamptz ------------------------ 2012-07-02 02:14:33-07 (1 row) postgres=# create table testPITR4 as select * from pg_class,pg_description; SELECT 966420 postgres=# select * from current_timestamp(0); timestamptz ------------------------ 2012-07-02 02:35:31-07 (1 row) postgres=# \d List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | tesk | table | postgres public | test | table | postgres public | testpitr1 | table | postgres public | testpitr2 | table | postgres public | testpitr3 | table | postgres public | testpitr4 | table | postgres (7 rows) [postgres@localhost archive]$ pwd /home/postgres/archive [postgres@localhost archive]$ ls -lsh total 1.1G 64M -rw-------. 1 postgres postgres 64M Jul 2 01:41 000000020000000000000013 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000014 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000015 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000016 64M -rw-------. 1 postgres postgres 64M Jul 2 01:56 000000020000000000000017 64M -rw-------. 1 postgres postgres 64M Jul 2 02:04 000000020000000000000018 4.0K -rw-------. 1 postgres postgres 295 Jul 2 02:04 000000020000000000000018.00000020.backup 64M -rw-------. 1 postgres postgres 64M Jul 2 02:04 000000020000000000000019 64M -rw-------. 1 postgres postgres 64M Jul 2 02:05 00000002000000000000001A 64M -rw-------. 1 postgres postgres 64M Jul 2 02:05 00000002000000000000001B 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001C 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001D 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001E 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001F 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000020 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000021 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000022 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000023 将原来的PGDATA通过move名字改为pgdata_bad [postgres@localhost pg_xlog]$ pwd /database/pgdata_bad/pg_xlog [postgres@localhost pg_xlog]$ ls -lsh total 1.9G 64M -rw-------. 1 postgres postgres 64M Jun 14 22:33 00000001000000000000000C 64M -rw-------. 1 postgres postgres 64M Jun 14 22:33 00000001000000000000000D 64M -rw-------. 1 postgres postgres 64M Jun 14 21:05 00000001000000000000000E 64M -rw-------. 1 postgres postgres 64M Jun 14 22:05 00000001000000000000000F 64M -rw-------. 1 postgres postgres 64M Jun 15 03:40 00000002000000000000000D 64M -rw-------. 1 postgres postgres 64M Jun 15 03:46 00000002000000000000000E 64M -rw-------. 1 postgres postgres 64M Jun 15 03:51 00000002000000000000000F 64M -rw-------. 1 postgres postgres 64M Jun 15 04:23 000000020000000000000010 64M -rw-------. 1 postgres postgres 64M Jun 25 02:41 000000020000000000000011 64M -rw-------. 1 postgres postgres 64M Jun 30 01:24 000000020000000000000012 65M -rw-------. 1 postgres postgres 64M Jul 2 01:41 000000020000000000000013 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000014 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000015 64M -rw-------. 1 postgres postgres 64M Jul 2 01:52 000000020000000000000016 64M -rw-------. 1 postgres postgres 64M Jul 2 01:56 000000020000000000000017 64M -rw-------. 1 postgres postgres 64M Jul 2 02:04 000000020000000000000018 4.0K -rw-------. 1 postgres postgres 295 Jul 2 02:04 000000020000000000000018.00000020.backup 64M -rw-------. 1 postgres postgres 64M Jul 2 02:04 000000020000000000000019 65M -rw-------. 1 postgres postgres 64M Jul 2 02:05 00000002000000000000001A 64M -rw-------. 1 postgres postgres 64M Jul 2 02:05 00000002000000000000001B 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001C 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001D 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001E 64M -rw-------. 1 postgres postgres 64M Jul 2 02:07 00000002000000000000001F 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000020 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000021 65M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000022 64M -rw-------. 1 postgres postgres 64M Jul 2 02:35 000000020000000000000023 64M -rw-------. 1 postgres postgres 64M Jul 2 02:37 000000020000000000000024 4.0K -rw-------. 1 postgres postgres 56 Jun 14 22:34 00000002.history 4.0K drwx------. 2 postgres postgres 4.0K Jul 2 02:35 archive_status 36M -rw-------. 1 postgres postgres 36M Jun 25 02:41 xlogtemp.2046 24M -rw-------. 1 postgres postgres 24M Jun 30 01:24 xlogtemp.2077将之前打包备份的文件释放到pgdata位置,并重建pg_xlog文件,然后启动
# rm -rf pg_xlog
# mkdir -p pg_xlog/archive_status
这个时候启动是正常的,Psql可以登录进去
postgres=# \d List of relations Schema | Name | Type | Owner --------+-------------+-------+---------- public | tesk | table | postgres public | test | table | postgres public | testpitr1 | table | postgres (4 rows)因为没有做恢复,所以是正常的。
然后关闭数据库,设置recovery.conf文件
restore_command = 'cp /home/postgres/archive/%f %p'
recovery_target_time = '2012-07-02 02:10:31'
设置完了再启动就报错了,日志如下:
[root@localhost pg_log]# more postgresql-2012-07-03_014309.csv 2012-07-03 01:43:09.701 PDT,7621,4ff2b09d.1dc5,1,2012-07-03 01:43:09 PDT,LOG,00000,"database system was shut down at 2012-07-03 00:03:21 PDT","" 2012-07-03 01:43:09.764 PDT,2,"starting point-in-time recovery to 2012-07-02 02:10:31-07","" 2012-07-03 01:43:14.177 PDT,3,"restored log file ""000000020000000000000019"" from archive",4,"invalid resource manager ID in primary checkpoint record","" 2012-07-03 01:43:14.342 PDT,5,"restored log file ""000000020000000000000018"" from archive",6,"invalid xl_info in secondary checkpoint record",7,PANIC,XX000,"could not locate a valid checkpoint record","" 2012-07-03 01:43:18.500 PDT,7619,4ff2b09c.1dc3,2012-07-03 01:43:08 PDT,"startup process (PID 7621) was terminated by signal 6: Aborted","aborting startup due to startup process failure","" [root@localhost pg_log]# more postgresql-2012-07-03_014309.log cp: cannot stat `/home/postgres/archive/00000002.history': No such file or directory [root@localhost archive]# more 00000002.history 1 00000001000000000000000D no recovery target specified看了一下,/home/postgres/archive/00000002.history这个文件确实是没有,就从老的备份文件里面拷贝了一份过去,再启动,.log文件没有信息了,但是.csv文件报错如下:
[root@localhost pg_log]# more postgresql-2012-07-03_014413.csv 2012-07-03 01:44:13.159 PDT,7647,4ff2b0dd.1ddf,2012-07-03 01:44:13 PDT,"" 2012-07-03 01:44:13.168 PDT,"restored log file ""00000002.history"" from archive","" 2012-07-03 01:44:13.300 PDT,"" 2012-07-03 01:44:13.407 PDT,8,"" 2012-07-03 01:44:13.811 PDT,7645,4ff2b0dc.1ddd,2012-07-03 01:44:12 PDT,"startup process (PID 7647) was terminated by signal 6: Aborted",""最终的PG_CONTROLDATA信息如下:
[postgres@localhost pgdata]$ pg_controldata pg_control version number: 903 Catalog version number: 201105231 Database system identifier: 5735970894348214195 Database cluster state: shut down pg_control last modified: Tue 03 Jul 2012 12:03:21 AM PDT Latest checkpoint location: 0/64000020 Prior checkpoint location: 0/60000140 Latest checkpoint's REDO location: 0/64000020 Latest checkpoint's TimeLineID: 2 Latest checkpoint's NextXID: 0/1859 Latest checkpoint's NextOID: 40985 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1792 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Time of latest checkpoint: Tue 03 Jul 2012 12:03:17 AM PDT Minimum recovery ending location: 0/0 Backup start location: 0/0 Current wal_level setting: hot_standby Current max_connections setting: 100 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 1048576 WAL block size: 65536 Bytes per WAL segment: 67108864 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value
########说明##########
在上述恢复的过程中,该DB被启动了两次,在第一次启动的时候是没有指定recovery.conf文件的,这里有一个前后的顺序问题,应该先配置recovery.conf,配置其恢复的时间点,然后启动DB。启动时已经有了一个check点了,这个时候再恢复到过去是不可能的,当然了,如果把recovery.conf中的recovery_target_time设置成比第一次启动晚的时间点也是可以的。 我本机的检测过程如下:
[postgres@localhost pgdata]$ psql psql (9.1.2) Type "help" for help. postgres=# select pg_current_xlog_location(); pg_current_xlog_location -------------------------- 0/94000078 (1 row) postgres=# \q [postgres@localhost pgdata]$ pg_stop waiting for server to shut down................. done server stopped [postgres@localhost pgdata]$ pg_start server starting [postgres@localhost pgdata]$ psql psql (9.1.2) Type "help" for help. postgres=# select pg_current_xlog_location(); pg_current_xlog_location -------------------------- 0/98000078 (1 row)