PostgreSQL操作符与优化器详解

Postgresql 支持自定义操作符，本质上是调用函数来实现的。@H_404_1@

语法如下：@H_404_1@

例如创建一个求两个值的平均值的操作符：@H_404_1@

首选要创建函数@H_404_1@

postgres=# create function f_avg(numeric,numeric) returns numeric as $$@H_404_1@

postgres$# select ($1+$2)/2;@H_404_1@

postgres$# $$ language sql strict;@H_404_1@

CREATE FUNCTION@H_404_1@

验证函数

postgres=# select f_avg(1,null);@H_404_1@

f_avg@H_404_1@

-------@H_404_1@

@H_404_1@

(1 row)@H_404_1@

postgres=# select f_avg(1,2);@H_404_1@

f_avg @H_404_1@

--------------------@H_404_1@

1.5000000000000000@H_404_1@

(1 row)@H_404_1@

创建操作符，指定左右参数类型，调用的函数名，commutator是一个和优化器相关的选项，我后面会重点介绍：@H_404_1@

postgres=# create operator ## (procedure=f_avg,leftarg=numeric,rightarg=numeric,commutator='##');@H_404_1@

CREATE OPERATOR@H_404_1@

postgres=# select 1 ## 2;@H_404_1@

?column? @H_404_1@

--------------------@H_404_1@

1.5000000000000000@H_404_1@

(1 row)@H_404_1@

注意到在创建操作符的语法中有6个和优化器有关的关键字：@H_404_1@

[,COMMUTATOR=com_op][,NEGATOR=neg_op]@H_404_1@

[,RESTRICT=res_proc][,JOIN=join_proc]@H_404_1@

[,HASHES][,MERGES]@H_404_1@

介绍如下：@H_404_1@

假设x表示操作符左侧的参数，y表示操作符右侧的参数@H_404_1@

1. commutator，指明x op1 y等效于y op2 x，即操作数调换，返回的值一样。例如2>1 和1<2结果是一致的。那么>就是<的commutator或者反之。又例如1+2和2+1是等价的，那么+就是+的commutator。commutator只需要在创建其中一个操作符时指定，创建另一个对应的操作符时可以不需要指定，Postgresql会自动建立这个关系。例如创建>操作符时指定了它的commutator是<，那么在创建<操作符时可以不需要指定>是它的commutator。@H_404_1@

另外需要注意，有commutator操作符的操作符的左右两侧的参数类型必须一致，这样才能满足x op1 y等价于y op2 x。@H_404_1@

优化器如何利用commutator呢？例如索引扫描，必须列在操作符的左侧才能使用索引。1 > tbl.c这个条件，如果>没有commutator的话，是不能使用索引的。@H_404_1@

例子，以int4的>和<操作符为例，实验一下：@H_404_1@

>和<在Postgresql中是一对commutator@H_404_1@

postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;@H_404_1@

oprcom @H_404_1@

--------------@H_404_1@

pg_catalog.<@H_404_1@

(1 row)@H_404_1@

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;@H_404_1@

oprcom @H_404_1@

--------------@H_404_1@

pg_catalog.>@H_404_1@

(1 row)@H_404_1@

记录他们的oprcom对应的OID@H_404_1@

postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;@H_404_1@

ode | oprrest | oprjoin @H_404_1@

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----@H_404_1@

----+-------------+-----------------@H_404_1@

> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4@H_404_1@

gt | scalargtsel | scalargtjoinsel@H_404_1@

(1 row)@H_404_1@

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;@H_404_1@

ode | oprrest | oprjoin @H_404_1@

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----@H_404_1@

----+-------------+-----------------@H_404_1@

< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4@H_404_1@

lt | scalarltsel | scalarltjoinsel@H_404_1@

(1 row)@H_404_1@

接下来我要通过更新pg_operator解除他们的commutator关系，设置为0即可。@H_404_1@

postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;@H_404_1@

UPDATE 1@H_404_1@

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;@H_404_1@

UPDATE 1@H_404_1@

创建测试表，插入测试数据，创建索引：@H_404_1@

postgres=# create table tbl(id int);@H_404_1@

CREATE TABLE@H_404_1@

postgres=# insert into tbl select generate_series(1,100000);@H_404_1@

INSERT 0 100000@H_404_1@

postgres=# create index idx_tbl_id on tbl(id);@H_404_1@

CREATE INDEX@H_404_1@

将列放在条件的左边可以走索引，但是放在右边不走索引。因为优化器不能决定>,<是否为commutator@H_404_1@

postgres=# explain select * from tbl where id<10;@H_404_1@

QUERY PLAN @H_404_1@

---------------------------------------------------------------------------@H_404_1@

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)@H_404_1@

Index Cond: (id < 10)@H_404_1@

(2 rows)@H_404_1@

postgres=# explain select * from tbl where 10>id;@H_404_1@

QUERY PLAN @H_404_1@

----------------------------------------------------------@H_404_1@

Seq Scan on tbl (cost=0.00..1361.00 rows=33333 width=4)@H_404_1@

Filter: (10 > id)@H_404_1@

(2 rows)@H_404_1@

重新建立这两个 operator的commutator关系后，优化器会自动将10>id转换为id<10，并且走索引了：@H_404_1@

postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;@H_404_1@

UPDATE 1@H_404_1@

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;@H_404_1@

UPDATE 1@H_404_1@

postgres=# explain select * from tbl where 10>id;@H_404_1@

QUERY PLAN @H_404_1@

---------------------------------------------------------------------------@H_404_1@

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)@H_404_1@

Index Cond: (id < 10)@H_404_1@

(2 rows)@H_404_1@

2. negator，指x op1 y 等价于 not(y op2 x)，或者x op1等价于not( y op2)，或者op1 x 等价于not(op2 y)，因此negator支持一元和二元操作符。@H_404_1@

例子:@H_404_1@

如果=和<>是一对negator操作符，NOT (x = y) 可以简化为 x <> y。@H_404_1@

postgres=# explain select * from tbl where 10=id;@H_404_1@

QUERY PLAN @H_404_1@

---------------------------------------------------------------------------@H_404_1@

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)@H_404_1@

Index Cond: (id = 10)@H_404_1@

(2 rows)@H_404_1@

postgres=# explain select * from tbl where not(10<>id);@H_404_1@

QUERY PLAN @H_404_1@

---------------------------------------------------------------------------@H_404_1@

Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)@H_404_1@

Index Cond: (id = 10)@H_404_1@

(2 rows)@H_404_1@

同样，操作符两侧参数x,y的类型必须一致。并且仅适用于返回布尔逻辑类型的操作符。@H_404_1@

3. restrict，是用于评估选择性的函数，仅适用于二元操作符，例如where col>100，这个查询条件，如何评估选择性呢？是通过操作符的restrict来指定的，选择性乘以pg_class.reltuples就可以评估得到这个查询条件的行数。@H_404_1@

选择性函数的代码在 src/backend/utils/adt/@H_404_1@

包括@H_404_1@

-rw-r--r--.11107110733191Jun1003:29array_selfuncs.c@H_404_1@

-rw-r--r--.1110711072316Jun1003:29geo_selfuncs.c@H_404_1@

-rw-r--r--.111071107 720Jun1003:29network_selfuncs.c@H_404_1@

-rw-r--r--.11107110733895Jun1003:29rangetypes_selfuncs.c@H_404_1@

-rw-r--r--.111071107218809Jun1003:29selfuncs.c@H_404_1@

选择性函数，还需要依赖数据库的统计信息，从而计算选择性，常见的选择性计算函数有：@H_404_1@

postgres=# select distinct oprrest from pg_operator order by 1;@H_404_1@

oprrest @H_404_1@

--------------@H_404_1@

-@H_404_1@

eqsel 相等@H_404_1@

neqsel 不相等@H_404_1@

scalarltsel 小于等于@H_404_1@

scalargtsel 大于等于@H_404_1@

areasel@H_404_1@

positionsel@H_404_1@

contsel@H_404_1@

iclikesel@H_404_1@

icnlikesel@H_404_1@

regexeqsel@H_404_1@

likesel@H_404_1@

icregexeqsel@H_404_1@

regexnesel@H_404_1@

nlikesel@H_404_1@

icregexnesel@H_404_1@

rangesel@H_404_1@

networksel@H_404_1@

tsmatchsel@H_404_1@

arraycontsel@H_404_1@

(20 rows)@H_404_1@

当然，用户如果自定义数据类型的话，也可以自定义选择性函数，或者使用以上标准的选择性函数，只是可能需要实现一下类型转换。@H_404_1@

源码中的介绍：@H_404_1@

src/backend/utils/adt/selfuncs.c@H_404_1@

/*----------@H_404_1@

* Operator selectivity estimation functions are called to estimate the@H_404_1@

* selectivity of WHERE clauses whose top-level operator is their operator.@H_404_1@

* We divide the problem into two cases:@H_404_1@

* Restriction clause estimation: the clause involves vars of just@H_404_1@

* one relation. 一种是符合WHERE条件的选择性（百分比）。@H_404_1@

* Join clause estimation: the clause involves vars of multiple rels.@H_404_1@

* Join selectivity estimation is far more difficult and usually less accurate@H_404_1@

* than restriction estimation. -- JOIN的选择性评估通常没有WHERE条件的选择性准确。@H_404_1@

*@H_404_1@

* When dealing with the inner scan of a nestloop join,we consider the@H_404_1@

* join's joinclauses as restriction clauses for the inner relation,and@H_404_1@

* treat vars of the outer relation as parameters (a/k/a constants of unknown@H_404_1@

* values). So,restriction estimators need to be able to accept an argument@H_404_1@

* telling which relation is to be treated as the variable.@H_404_1@

在使用nestloop JOIN时，一个表的字段将作为变量，另一个表的字段（及其统计信息）与操作符作为JOIN评估子句。@H_404_1@

*@H_404_1@

* The call convention for a restriction estimator (oprrest function) is@H_404_1@

*@H_404_1@

* Selectivity oprrest (PlannerInfo *root,@H_404_1@

* Oid operator,@H_404_1@

* List *args,@H_404_1@

* int varRelid);@H_404_1@

* 评估选择性需要4个参数:@H_404_1@

* root: general information about the query (rtable and RelOptInfo lists@H_404_1@

* are particularly important for the estimator). plannerinfo信息。@H_404_1@

* operator: OID of the specific operator in question. 操作符的OID@H_404_1@

* args: argument list from the operator clause. 操作符子句中的参数列表@H_404_1@

* varRelid: if not zero,the relid (rtable index) of the relation to@H_404_1@

* be treated as the variable relation. May be zero if the args list@H_404_1@

* is known to contain vars of only one relation. 表示where条件所包含的参数来自哪些relation。@H_404_1@

*@H_404_1@

* This is represented at the sql level (in pg_proc) as@H_404_1@

*@H_404_1@

* float8 oprrest (internal,oid,internal,int4); 在pg_proc数据字典中表示为oprrest指定的函数。@H_404_1@

*@H_404_1@

* The result is a selectivity,that is,a fraction (0 to 1) of the rows@H_404_1@

* of the relation that are expected to produce a TRUE result for the@H_404_1@

* given operator. 选择性函数的评估结果就是一个百分比。乘以pg_class.reltuples就可以得到记录数。@H_404_1@

*@H_404_1@

* The call convention for a join estimator (oprjoin function) is similar@H_404_1@

* except that varRelid is not needed,and instead join information is@H_404_1@

* supplied:@H_404_1@

* JOIN选择性的计算函数与WHERE选择性的计算函数参数有轻微差别，么有varRelid,增加了join信息的参数。@H_404_1@

* Selectivity oprjoin (PlannerInfo *root,@H_404_1@

* JoinType jointype,@H_404_1@

* SpecialJoinInfo *sjinfo);@H_404_1@

*@H_404_1@

* float8 oprjoin (internal,int2,internal);@H_404_1@

*@H_404_1@

* (Before Postgres 8.4,join estimators had only the first four of these@H_404_1@

* parameters. That signature is still allowed,but deprecated.) The@H_404_1@

* relationship between jointype and sjinfo is explained in the comments for@H_404_1@

* clause_selectivity() --- the short version is that jointype is usually@H_404_1@

* best ignored in favor of examining sjinfo.@H_404_1@

*@H_404_1@

* Join selectivity for regular inner and outer joins is defined as the@H_404_1@

* fraction (0 to 1) of the cross product of the relations that is expected@H_404_1@

* to produce a TRUE result for the given operator. For both semi and anti (半连接与预连接)@H_404_1@

* joins,however,the selectivity is defined as the fraction of the left-hand@H_404_1@

* side relation's rows that are expected to have a match (ie,at least one@H_404_1@

* row with a TRUE result) in the right-hand side.@H_404_1@

*@H_404_1@

* For both oprrest and oprjoin functions,the operator's input collation OID@H_404_1@

* (if any) is passed using the standard fmgr mechanism,so that the estimator@H_404_1@

* function can fetch it with PG_GET_COLLATION(). Note,that all@H_404_1@

* statistics in pg_statistic are currently built using the database's default@H_404_1@

* collation. Thus,in most cases where we are looking at statistics,we@H_404_1@

* should ignore the actual operator collation and use DEFAULT_COLLATION_OID.@H_404_1@

* We expect that the error induced by doing this is usually not large enough@H_404_1@

* to justify complicating matters.@H_404_1@

*----------@H_404_1@

4. join，是joinsel即join的选择性计算函数。@H_404_1@

对应pg_operator.oprjoin@H_404_1@

postgres=# select distinct oprjoin from pg_operator order by 1;@H_404_1@

oprjoin @H_404_1@

------------------@H_404_1@

-@H_404_1@

eqjoinsel@H_404_1@

neqjoinsel@H_404_1@

scalarltjoinsel@H_404_1@

scalargtjoinsel@H_404_1@

areajoinsel@H_404_1@

positionjoinsel@H_404_1@

contjoinsel@H_404_1@

iclikejoinsel@H_404_1@

icnlikejoinsel@H_404_1@

regexeqjoinsel@H_404_1@

likejoinsel@H_404_1@

icregexeqjoinsel@H_404_1@

regexnejoinsel@H_404_1@

nlikejoinsel@H_404_1@

icregexnejoinsel@H_404_1@

networkjoinsel@H_404_1@

tsmatchjoinsel@H_404_1@

arraycontjoinsel@H_404_1@

(19 rows)@H_404_1@

5. hashes@H_404_1@

6. merges@H_404_1@

hashes和merges表示该操作符是否允许hash join和merge join,只有返回布尔逻辑值的二元操作符满足这个要求。@H_404_1@

我们在pg_operator这个catalog中也可以查看到对应的介绍：@H_404_1@

Name	Type	References	Description
oid	oid		Row identifier (hidden attribute; must be explicitly selected)
oprname	name		Name of the operator
oprnamespace	oid	pg_namespace.oid	The OID of the namespace that contains this operator
oprowner	oid	pg_authid.oid	Owner of the operator
oprkind	char		b= infix ("between"),l= prefix ("left"),r= postfix ("right") 指定操作符在什么位置，例如中间，左侧，右侧
oprcanmerge	bool		This operator supports merge joins此操作符是否支持merge join
oprcanhash	bool		This operator supports hash joins此操作符是否支持hash join
oprleft	oid	pg_type.oid	Type of the left operand操作符左侧的数据类型
oprright	oid	pg_type.oid	Type of the right operand操作符右侧的数据类型
oprresult	oid	pg_type.oid	Type of the result返回结果的数据类型
oprcom	oid	pg_operator.oid	Commutator of this operator,if any
oprnegate	oid	pg_operator.oid	Negator of this operator,if any
oprcode	regproc	pg_proc.oid	Function that implements this operator
oprrest	regproc	pg_proc.oid	Restriction selectivity estimation function for this operator
oprjoin	regproc	pg_proc.oid	Join selectivity estimation function for this operator

PostgreSQL操作符与优化器详解

猜你在找的Postgre SQL相关文章