转载:http://diegoball.iteye.com/blog/713826
1. version.
version ------------------------------------------------------------------------------------------------------------ Postgresql 8.4.4 on i686-pc-linux-gnu,compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10),32-bit
2. 和Oracle的分区一样,Postgressql(以下简称PG)的分区的意思是把逻辑上的一个大表分割成物理上的几块儿。分区不仅仅能带来访问速度的提升,关键的它带来的是管理和维护的方便。想想一张上T的表,管理起来是一件多么费劲的事情。小表分区不实际。表在多大情况下才考虑分区呢?Postgressql官方给出的建议是当表本身大小超过了机器物理内存的实际大小时,可以考虑分区。PG目前(8.4.4)仅支持范围分区和列表分区,尚未支持散列分区。
3. PG的分区是基于表继承的方式实现的,无论何时,都应保证父表里面是空的。例子:
--建父表 CREATE TABLE measurement ( city_id int not null,logdate date not null,peaktemp int,unitsales int );
--建分区表 CREATE TABLE measurement_y2006m01 ( CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m02 ( CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m03 ( CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m04 ( CHECK ( logdate >= DATE '2006-04-01' AND logdate < DATE '2006-05-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m05 ( CHECK ( logdate >= DATE '2006-05-01' AND logdate < DATE '2006-06-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m06 ( CHECK ( logdate >= DATE '2006-06-01' AND logdate < DATE '2006-07-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m07 ( CHECK ( logdate >= DATE '2006-07-01' AND logdate < DATE '2006-08-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m08 ( CHECK ( logdate >= DATE '2006-08-01' AND logdate < DATE '2006-09-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m09 ( CHECK ( logdate >= DATE '2006-09-01' AND logdate < DATE '2006-10-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m10 ( CHECK ( logdate >= DATE '2006-10-01' AND logdate < DATE '2006-11-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m11 ( CHECK ( logdate >= DATE '2006-11-01' AND logdate < DATE '2006-12-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m12 ( CHECK ( logdate >= DATE '2006-12-01' AND logdate < DATE '2007-01-01' ) ) INHERITS (measurement);
分区键上建索引:
CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate); CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate); ...
4. 对于开发人员来说,希望数据库是透明的,只管insert into measurement.对于数据插向哪个分区,则希望由DB决定。这点,ORACLE实现了,但是PG不行,毕竟是免费的。需要前期人工处理下:
首先建立个函数
CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.logdate >= DATE '2006-01-01' AND NEW.logdate < DATE '2006-02-01' ) THEN INSERT INTO measurement_y2006m01 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-02-01' AND NEW.logdate < DATE '2006-03-01' ) THEN INSERT INTO measurement_y2006m02 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-03-01' AND NEW.logdate < DATE '2006-04-01' ) THEN INSERT INTO measurement_y2006m03 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-04-01' AND NEW.logdate < DATE '2006-05-01' ) THEN INSERT INTO measurement_y2006m04 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-05-01' AND NEW.logdate < DATE '2006-06-01' ) THEN INSERT INTO measurement_y2006m05 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-06-01' AND NEW.logdate < DATE '2006-07-01' ) THEN INSERT INTO measurement_y2006m06 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-07-01' AND NEW.logdate < DATE '2006-08-01' ) THEN INSERT INTO measurement_y2006m07 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-08-01' AND NEW.logdate < DATE '2006-09-01' ) THEN INSERT INTO measurement_y2006m08 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-09-01' AND NEW.logdate < DATE '2006-10-01' ) THEN INSERT INTO measurement_y2006m09 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-10-01' AND NEW.logdate < DATE '2006-11-01' ) THEN INSERT INTO measurement_y2006m10 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-11-01' AND NEW.logdate < DATE '2006-12-01' ) THEN INSERT INTO measurement_y2006m11 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-12-01' AND NEW.logdate < DATE '2007-01-01' ) THEN INSERT INTO measurement_y2006m12 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;
建立触发器
CREATE TRIGGER insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
测试下:
mydb=> \d measurement Table "sch2.measurement" Column | Type | Modifiers -----------+---------+----------- city_id | integer | not null logdate | date | not null peaktemp | integer | unitsales | integer | Triggers: insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger() mydb=> insert into measurement values (1,date '2006-02-10',1,1); INSERT 0 0 mydb=> insert into measurement values (1,date '2006-03-10',date '2006-04-10',1); INSERT 0 0 mydb=> select * from measurement; city_id | logdate | peaktemp | unitsales ---------+------------+----------+----------- 1 | 2006-02-10 | 1 | 1 1 | 2006-03-10 | 1 | 1 1 | 2006-04-10 | 1 | 1 (3 rows) mydb=> select * from measurement_y2006m03; city_id | logdate | peaktemp | unitsales ---------+------------+----------+----------- 1 | 2006-03-10 | 1 | 1 (1 row)
mydb=> select * from measurement; city_id | logdate | peaktemp | unitsales ---------+------------+----------+----------- 1 | 2006-02-10 | 1 | 1 1 | 2006-03-10 | 1 | 1 1 | 2006-04-10 | 1 | 1 (3 rows) mydb=>
删除一个分区
DROP TABLE measurement_y2006m02;
将一个表从分区表里删除,但是保留表,使其继续可以被使用:
ALTER TABLE measurement_y2006m02 NO INHERIT measurement;
添加分区的最快方式:
CREATE TABLE measurement_y2006m02 (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS); ALTER TABLE measurement_y2006m02 ADD CONSTRAINT y2006m02 CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ); --从文件中拷贝数据 \copy measurement_y2006m02 from '/home/postgres/measurement_y2006m02' ALTER TABLE measurement_y2008m02 INHERIT measurement;
5.分区排除和查询优化。默认情况下,是不会开启分区排除的,索引针对一个基于分区键条件的检索,要扫描所有的分区:
mydb=> SET constraint_exclusion = off; SET mydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01'; QUERY PLAN ----------------------------------------------------------------------------------------------- Aggregate (cost=436.80..436.81 rows=1 width=0) -> Append (cost=0.00..417.62 rows=7670 width=0) -> Seq Scan on measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m01 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m02 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m03 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m04 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m05 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m06 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m07 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m08 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m09 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m10 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m11 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m12 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) (28 rows)
开启:
mydb=> SET constraint_exclusion = on; SET mydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01'; QUERY PLAN ----------------------------------------------------------------------------------------------- Aggregate (cost=100.80..100.81 rows=1 width=0) -> Append (cost=0.00..96.38 rows=1770 width=0) -> Seq Scan on measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m01 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) -> Seq Scan on measurement_y2006m02 measurement (cost=0.00..32.12 rows=590 width=0) Filter: (logdate <= '2006-02-01'::date) (8 rows)
还可以通过建立规则的方式来决定向哪个分区插入数据:
CREATE RULE measurement_insert_y2006m02 AS ON INSERT TO measurement WHERE ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) DO INSTEAD INSERT INTO measurement_y2006m02 VALUES (NEW.*); ...
此外:
VACUUMorANALYZEmeasurement 只会对父表起作用,要想分析表,需要一次分析每个分区表。
熟悉ORACLE的同志应该觉得PG的分区表比ORACLE是要逊色很多吧。