PostgreSQL 执行计划与实际成本的偏差

一、嵌套循环连接,外表的关联字段存在大量重复值时,总成本存在偏差.
二、合并连接,关联字段少部分相交,总成本远小于子节点的成本和. 因为任何一个表都只需要扫描到匹配结束.
三、使用LIMIT限制时,总成本也将小于节点成本.

1. 嵌套循环连接,总成本存在偏差.
①、建表

DROP TABLE
postgres=# create table tb11(id integer,info character varying);
CREATE TABLE
postgres=# create table tb12(id integer,info character varying);
CREATE TABLE

②、插入数据

postgres=# insert into tb11 select generate_series(1,100),'john' from generate_series(1,10);
INSERT 0 1000
postgres=# insert into tb12 select 1 from generate_series(1,25);
INSERT 0 25

③、查看执行计划

postgres=# explain (analyze,verbose,buffers) select * from tb11,tb12 where tb11.id=tb12.id and tb12.id=1;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..34.12 rows=250 width=45) (actual time=0.022..0.306 rows=250 loops=1)
   Output: tb11.id,tb11.info,tb12.id,tb12.info
   Buffers: shared hit=16
   ->  Seq Scan on public.tb11  (cost=0.00..18.50 rows=10 width=9) (actual time=0.016..0.155 rows=10 loops=1)
         Output: tb11.id,tb11.info
         Filter: (tb11.id = 1)
         Rows Removed by Filter: 990
         Buffers: shared hit=6
   ->  Seq Scan on public.tb12  (cost=0.00..1.31 rows=25 width=36) (actual time=0.001..0.007 rows=25 loops=10)
         Output: tb12.id,tb12.info
         Filter: (tb12.id = 1)
         Buffers: shared hit=10
 Total runtime: 0.361 ms
(13 rows)

执行计划的成本：34.12=18.5+10*1.31+联结的成本=31.60+联结的成本
实际的时间成本：0.306=0.155+10*0.007+联结的时间=0.255+联结的时间
0.306=0.155+0.007+联结的时间=0.162+联结的时间
看比值：18.5/0.155=119.354
34.12/0.306=111.503 比值差不多，看来实际的时间跟执行计划的成本一样，都是以循环来计算的。
因为自己的执行计划的实际时间还是按照循环次数来算的，借用德哥的教程中的执行计划来说明。

digoal=# explain (analyze,buffers,timing,costs) select * from n1,n2 where n1.id=n2.id and n2.id=1;
						QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=0.00..437.00 rows=250 width=45) (actual time=0.042..0.177 rows=10 loops=1) -- 计算嵌套循环总成本时,按照25次循环计算. 所以比实际成本大很多. 实际时间0.177仅仅比0.16+0.01略大. 因为实际只循环一次.
   Output: n1.id,n1.info,n2.id,n2.info
   Buffers: shared hit=3
   ->   Seq Scan on public.n2 (cost=0.00..72.00 rows=25 width=36) (actual time=0.010..0.010 rows=1 loops=1) -- 评估25行,实际1行
          Output: n2.id,n2.info
	  Filter: (n2.id = 1)
	  Buffers: shared hit=1
   -> Seq Scan on public.n1 (cost=0.00..14.50 rows=10 width=9) (actual time=0.027..0.160 rows=10 loops=1) -- 评估需要循环25次,实际循环1次
          Output: n1.id,n1.info
	  Filter: (n1.id = 1)
	  Rows Removed by Filter: 990
	  Buffers: shared hit=2
	  Total runtime: 0.214 ms
(13 rows)

2. 合并连接,总成本远小于子节点的成本和. 因为任何一个表都只需要扫描到匹配结束.
①、建表

postgres=# create table index_t1(id serial primary key,name character varying);
CREATE TABLE
postgres=# create table index_t2(id serial primary key,name character varying);
CREATE TABLE

②、插入数据

postgres=# insert into index_t1 select generate_series(1,10),'john';
INSERT 0 10
postgres=# insert into index_t2 select generate_series(1,100000),'john';
INSERT 0 100000

③、analyze两个表

postgres=# analyze index_t1;
ANALYZE
postgres=# analyze index_t2;
ANALYZE

④、查看执行计划

postgres=# explain (analyze,buffers) select * from index_t1 t1,index_t2 t2 where t1.id=t2.id;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.56..2.01 rows=10 width=18) (actual time=0.018..0.026 rows=10 loops=1)
   Output: t1.id,t1.name,t2.id,t2.name
   Merge Cond: (t2.id = t1.id)
   Buffers: shared hit=4
   ->  Index Scan using index_t2_pkey on public.index_t2 t2  (cost=0.29..3148.29 rows=100000 width=9) (actual time=0.006..0.009 rows=11 loops=1)
         Output: t2.id,t2.name
         Buffers: shared hit=3
   ->  Sort  (cost=1.27..1.29 rows=10 width=9) (actual time=0.008..0.008 rows=10 loops=1)
         Output: t1.id,t1.name
         Sort Key: t1.id
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1
         ->  Seq Scan on public.index_t1 t1  (cost=0.00..1.10 rows=10 width=9) (actual time=0.001..0.002 rows=10 loops=1)
               Output: t1.id,t1.name
               Buffers: shared hit=1
 Total runtime: 0.050 ms
(16 rows)

按照最前面说的成本的计算方法。

postgres=# select 1.29+4148.29*(10/100000);
 ?column? 
----------
     1.29
(1 row)

合并的成本=2.01-1.29=0.72。

来自http://www.postgresql.org/docs/current/static/using-explain.html的原话：

Merge joins also have measurement artifacts that can confuse the unwary. A merge join will stop reading one input if it's exhausted the other input and the next key value in the one input is greater than the last key value of the other input; in such a case there can be no more matches and so no need to scan the rest of the first input. This results in not reading all of one child,with results like those mentioned for LIMIT. Also,if the outer (first) child contains rows with duplicate key values,the inner (second) child is backed up and rescanned for the portion of its rows matching that key value. EXPLAIN ANALYZE counts these repeated emissions of the same inner rows as if they were real additional rows. When there are many outer duplicates,the reported actual row count for the inner child plan node can be significantly larger than the number of rows that are actually in the inner relation.

3. 使用LIMIT限制时,总成本也将小于节点成本.

postgres=# explain (analyze,buffers)select * from tb11 limit 1;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.02 rows=1 width=9) (actual time=0.013..0.013 rows=1 loops=1)
   Output: id,info
   Buffers: shared hit=1
   ->  Seq Scan on public.tb11  (cost=0.00..62.00 rows=4000 width=9) (actual time=0.012..0.012 rows=1 loops=1)
         Output: id,info
         Buffers: shared hit=1
 Total runtime: 0.031 ms
(7 rows)


Time: 0.411 ms

总成本 0.02=62.00*1/4000+limit操作的成本=0.0155+limit操作的成本。

PostgreSQL 执行计划与实际成本的偏差

猜你在找的Postgre SQL相关文章