转载:PostgreSQL的分区表

前端之家收集整理的这篇文章主要介绍了转载:PostgreSQL的分区表前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

转载:http://diegoball.iteye.com/blog/713826

1. version.

                               version                                                   
------------------------------------------------------------------------------------------------------------

 Postgresql 8.4.4 on i686-pc-linux-gnu,compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10),32-bit

2. 和Oracle的分区一样,Postgressql(以下简称PG)的分区的意思是把逻辑上的一个大表分割成物理上的几块儿。分区不仅仅能带来访问速度的提升,关键的它带来的是管理和维护的方便。想想一张上T的表,管理起来是一件多么费劲的事情。小表分区不实际。表在多大情况下才考虑分区呢?Postgressql官方给出的建议是当表本身大小超过了机器物理内存的实际大小时,可以考虑分区。PG目前(8.4.4)仅支持范围分区和列表分区,尚未支持散列分区。

3. PG的分区是基于表继承的方式实现的,无论何时,都应保证父表里面是空的。例子:

--建父表

CREATE TABLE measurement (

    city_id         int not null,logdate         date not null,peaktemp        int,unitsales       int

);

--建分区表

CREATE TABLE measurement_y2006m01 (

    CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )

) INHERITS (measurement);


CREATE TABLE measurement_y2006m02 (

    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m03 (

    CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )

) INHERITS (measurement);



CREATE TABLE measurement_y2006m04 (

    CHECK ( logdate >= DATE '2006-04-01' AND logdate < DATE '2006-05-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m05 (

    CHECK ( logdate >= DATE '2006-05-01' AND logdate < DATE '2006-06-01' )

) INHERITS (measurement);


CREATE TABLE measurement_y2006m06 (

    CHECK ( logdate >= DATE '2006-06-01' AND logdate < DATE '2006-07-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m07 (

    CHECK ( logdate >= DATE '2006-07-01' AND logdate < DATE '2006-08-01' )

) INHERITS (measurement);



CREATE TABLE measurement_y2006m08 (
    CHECK ( logdate >= DATE '2006-08-01' AND logdate < DATE '2006-09-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m09 (

    CHECK ( logdate >= DATE '2006-09-01' AND logdate < DATE '2006-10-01' )

) INHERITS (measurement);


CREATE TABLE measurement_y2006m10 (

    CHECK ( logdate >= DATE '2006-10-01' AND logdate < DATE '2006-11-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m11 (

    CHECK ( logdate >= DATE '2006-11-01' AND logdate < DATE '2006-12-01' )

) INHERITS (measurement);

CREATE TABLE measurement_y2006m12 (

    CHECK ( logdate >= DATE '2006-12-01' AND logdate < DATE '2007-01-01' )

) INHERITS (measurement);

分区键上建索引:

CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);

CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);

...

4. 对于开发人员来说,希望数据库是透明的,只管insert into measurement.对于数据插向哪个分区,则希望由DB决定。这点,ORACLE实现了,但是PG不行,毕竟是免费的。需要前期人工处理下:

首先建立个函数


CREATE OR REPLACE FUNCTION measurement_insert_trigger()

RETURNS TRIGGER AS $$

BEGIN

    IF ( NEW.logdate >= DATE '2006-01-01' AND

         NEW.logdate < DATE '2006-02-01' ) THEN

        INSERT INTO measurement_y2006m01 VALUES (NEW.*);

    ELSIF ( NEW.logdate >= DATE '2006-02-01' AND

            NEW.logdate < DATE '2006-03-01' ) THEN

        INSERT INTO measurement_y2006m02 VALUES (NEW.*);

	 ELSIF ( NEW.logdate >= DATE '2006-03-01' AND

            NEW.logdate < DATE '2006-04-01' ) THEN

        INSERT INTO measurement_y2006m03 VALUES (NEW.*);

		 ELSIF ( NEW.logdate >= DATE '2006-04-01' AND

            NEW.logdate < DATE '2006-05-01' ) THEN

        INSERT INTO measurement_y2006m04 VALUES (NEW.*);

   		 ELSIF ( NEW.logdate >= DATE '2006-05-01' AND

            NEW.logdate < DATE '2006-06-01' ) THEN

        INSERT INTO measurement_y2006m05 VALUES (NEW.*);

		 ELSIF ( NEW.logdate >= DATE '2006-06-01' AND

            NEW.logdate < DATE '2006-07-01' ) THEN

        INSERT INTO measurement_y2006m06 VALUES (NEW.*);

		 ELSIF ( NEW.logdate >= DATE '2006-07-01' AND

            NEW.logdate < DATE '2006-08-01' ) THEN

        INSERT INTO measurement_y2006m07 VALUES (NEW.*);

   		 ELSIF ( NEW.logdate >= DATE '2006-08-01' AND

            NEW.logdate < DATE '2006-09-01' ) THEN

        INSERT INTO measurement_y2006m08 VALUES (NEW.*);

   
    ELSIF ( NEW.logdate >= DATE '2006-09-01' AND

            NEW.logdate < DATE '2006-10-01' ) THEN

        INSERT INTO measurement_y2006m09 VALUES (NEW.*);


    ELSIF ( NEW.logdate >= DATE '2006-10-01' AND

            NEW.logdate < DATE '2006-11-01' ) THEN

        INSERT INTO measurement_y2006m10 VALUES (NEW.*);

		    ELSIF ( NEW.logdate >= DATE '2006-11-01' AND

            NEW.logdate < DATE '2006-12-01' ) THEN

        INSERT INTO measurement_y2006m11 VALUES (NEW.*);

    ELSIF ( NEW.logdate >= DATE '2006-12-01' AND

            NEW.logdate < DATE '2007-01-01' ) THEN

        INSERT INTO measurement_y2006m12 VALUES (NEW.*);

    ELSE

        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';

    END IF;

    RETURN NULL;

END;

$$

LANGUAGE plpgsql;

建立触发器

CREATE TRIGGER insert_measurement_trigger

    BEFORE INSERT ON measurement

    FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();

测试下:

mydb=> \d measurement

    Table "sch2.measurement"

  Column   |  Type   | Modifiers 

-----------+---------+-----------

 city_id   | integer | not null

 logdate   | date    | not null

 peaktemp  | integer | 

 unitsales | integer | 

Triggers:

    insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()



mydb=> insert into measurement values (1,date '2006-02-10',1,1);

INSERT 0 0

mydb=> insert into measurement values (1,date '2006-03-10',date '2006-04-10',1);

INSERT 0 0

mydb=> select * from measurement;

 city_id |  logdate   | peaktemp | unitsales 

---------+------------+----------+-----------

       1 | 2006-02-10 |        1 |         1

       1 | 2006-03-10 |        1 |         1

       1 | 2006-04-10 |        1 |         1

(3 rows)



mydb=> select * from measurement_y2006m03;

 city_id |  logdate   | peaktemp | unitsales 

---------+------------+----------+-----------

       1 | 2006-03-10 |        1 |         1

(1 row)

mydb=> select * from measurement;

 city_id |  logdate   | peaktemp | unitsales 

---------+------------+----------+-----------

       1 | 2006-02-10 |        1 |         1

       1 | 2006-03-10 |        1 |         1

       1 | 2006-04-10 |        1 |         1

(3 rows)



mydb=> 

删除一个分区

DROP TABLE measurement_y2006m02;

将一个表从分区表里删除,但是保留表,使其继续可以被使用:

ALTER TABLE measurement_y2006m02 NO INHERIT measurement;

添加分区的最快方式:

CREATE TABLE measurement_y2006m02

  (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);

ALTER TABLE measurement_y2006m02 ADD CONSTRAINT y2006m02

   CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' );

--从文件中拷贝数据

\copy measurement_y2006m02 from '/home/postgres/measurement_y2006m02'

ALTER TABLE measurement_y2008m02 INHERIT measurement;

5.分区排除和查询优化。默认情况下,是不会开启分区排除的,索引针对一个基于分区键条件的检索,要扫描所有的分区:
mydb=> SET constraint_exclusion = off;
SET
mydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01';
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=436.80..436.81 rows=1 width=0)
   ->  Append  (cost=0.00..417.62 rows=7670 width=0)
         ->  Seq Scan on measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m01 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m03 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m04 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m05 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m06 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m07 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m08 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m09 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m10 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m11 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m12 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
(28 rows)

开启:

mydb=> SET constraint_exclusion = on;
SET
mydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01';
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=100.80..100.81 rows=1 width=0)
   ->  Append  (cost=0.00..96.38 rows=1770 width=0)
         ->  Seq Scan on measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m01 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
         ->  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..32.12 rows=590 width=0)
               Filter: (logdate <= '2006-02-01'::date)
(8 rows)

还可以通过建立规则的方式来决定向哪个分区插入数据:

CREATE RULE measurement_insert_y2006m02 AS
ON INSERT TO measurement WHERE
    ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
DO INSTEAD
    INSERT INTO measurement_y2006m02 VALUES (NEW.*);
...

外:

VACUUMorANALYZEmeasurement 只会对父表起作用,要想分析表,需要一次分析每个分区表。

熟悉ORACLE的同志应该觉得PG的分区表比ORACLE是要逊色很多吧。

猜你在找的Postgre SQL相关文章