转载：PostgreSQL-XC : Data Replication or Distribution

转载：http://francs3.blog.163.com/blog/static/4057672720125453315201/

Postgresql-XC : Data Replication or Distribution

在 Postgresql-XC 体系中，数据分布有两种形式，即 Replication 或者 Distribution，
这里简单描述下 Postgresql-XC 这两种数据分布，下面是实验过程。

一 Replication or Distribution 解释

--1.1 Replication
表的每一行存在所有数据节点( datanode )中，即每个数据节点都有完整的表数据。

--1.2 Distribution
表的每一行仅存在一个数据节点( datanode )中，即每个数据节点仅保留表的部分数据。

二 replication 表测试
--2.1创建 replication 表并插入数据

francs=> create table test_replication (id int4 primary key,name varchar(32))@H_502_40@distribute by replication;
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_replication_pkey" for table "test_replication"
CREATE TABLE

francs=> insert into test_replication select generate_series(1,10000),'replication';
INSERT 0 10000

francs=> select count(*) from test_replication ;
count
-------
10000
(1 row)

--2.2 到数据节点一验证数据

[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs
psql (PGXC 1.0beta2,based on PG 9.1.3)
Type "help" for help.

francs=> select count(*) from test_replication ;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
10000
(1 row)

--2.3到数据节点二验证数据

[pgxc@redhatB pg_root]$ psql -p 15432 francs francs
psql (PGXC 1.0beta2,based on PG 9.1.3)
Type "help" for help.

francs=> select count(*) from test_replication ;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
10000
(1 row)

备注：可见 replication 表数据在每个数据节点都有完整数据( 如果在创建表时仅指定数据节点的情况除外)。

三 Distribute 表测试

Distribute 表数据分片方式有多种，包括 ROUND ROBIN， HASH ，MODULO，接下来以 hash,rounnd robin 分片

方式举例。

--3.1 创建 hash 分区表并插入数据

francs=> create table test_hash (id int4 primary key,name varchar(32)) @H_502_40@ distribute by hash(id);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_hash_pkey" for table "test_hash"
CREATE TABLE
francs=> insert into test_hash select generate_series(1,'hash';
INSERT 0 10000
francs=> select count(*) from test_hash;
count
-------
10000
(1 row)

--3.2 到数据节点一验证数据

[pgxc@redhatB gtm_standby]$ psql -p 15431 francs francs
psql (PGXC 1.0beta2,based on PG 9.1.3)
Type "help" for help.

francs=> select count(*) from test_hash;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
5039
(1 row)

--3.2 到数据节点二验证数据

[pgxc@redhatB pg_root]$ psql -p 15432 francs francs
psql (PGXC 1.0beta2,based on PG 9.1.3)
Type "help" for help.

francs=> select count(*) from test_hash;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
4961
(1 row)

备注：从上面看出 distributed 表数据节点表只存部分数据，当然创建表的时候也可以指定数据节点。

四执行计划比较

--4.1查询单条记录场景

francs=> explain verbose select * from test_hash where id=1;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: test_hash.id,test_hash.name
@H_502_40@Node/s: db_1
Remote query: SELECT id,name FROM test_hash WHERE (id = 1)
(4 rows)

francs=> explain verbose select * from test_replication where id=1;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: test_replication.id,test_replication.name
@H_502_40@ Node/s: db_1
Remote query: SELECT id,name FROM test_replication WHERE (id = 1)
(4 rows)

francs=> explain select * from test_hash where name='A';
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
@H_502_40@ Node/s: db_1,db_2
(2 rows)

备注：只查询单条记录， replication 只扫描一个数据节点。而 distribute 表如果根据分区键查询，只扫描一个节点，

如果根据非分区键查询，则需要扫描多个节点。

--4.2 count(*) 场景

francs=> explain select count(*) from test_hash;
QUERY PLAN
---------------------------------------------------------------------------------------------
Aggregate (cost=2.50..2.51 rows=1 width=0)
-> Materialize (cost=0.00..0.00 rows=0 width=0)
-> Data Node Scan on "__REMOTE_GROUP_QUERY__" (cost=0.00..0.00 rows=1000 width=0)
Node/s: db_1,db_2
(4 rows)

备注：distributd 表 count 语句扫描所有数据节点。

francs=> explain select count(*) from test_replication;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Node/s: db_1
(2 rows)

备注：replication

表 count 语句只扫描一个数据节点；

--4.3 round robin分片方式

francs=> create table test_round (id int4,name varchar(32)) @H_502_40@distribute by round robin;
CREATE TABLE

francs=> insert into test_round select generate_series(1,'a';
INSERT 0 10000

francs=> explain select * from test_round where id=1;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Node/s: @H_502_40@db_1,db_2
(2 rows)

备注 Round Robin 方式会将表数据分散到各个数据节点，但查询数据时由于不知道分片规则，故需要遍历

所有数据节点。

五总结：
--5.1 replication表

1 replication 表查询时只需要读任一个数据节点；
2 replication 表更改数据时，需要同时对所有数据节点进行，代价较大；
3 replication 适用于读比较繁忙的静态数据表。

--5.2 distribute 表

1 单独查询或者写一条记录时，如果根据分区键查询，只需要扫描一个数据节点
( Round Robin 分片方式除外)；
2 单独查询或者写一条记录时，如果根据非分区键，需要扫描所有数据节点。
3 如果查询需要扫描多个数据节点，性能会有所降低；

六：手册解释

REPLICATION Each row of the table will be replicated into all the datanode of the Postgres-XC database cluster.

转载：PostgreSQL-XC : Data Replication or Distribution

Postgresql-XC : Data Replication or Distribution

猜你在找的Postgre SQL相关文章