PostgreSQL 执行计划与实际成本的偏差

前端之家收集整理的这篇文章主要介绍了PostgreSQL 执行计划与实际成本的偏差前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
一、 嵌套循环连接,外表的关联字段存在大量重复值时,总成本存在偏差.
二、 合并连接,关联字段少部分相交,总成本远小于子节点的成本和. 因为任何一个表都只需要扫描到匹配结束.
三、 使用LIMIT限制时,总成本也将小于节点成本.


1. 嵌套循环连接,总成本存在偏差.

①、建表
  1. DROP TABLE
  2. postgres=# create table tb11(id integer,info character varying);
  3. CREATE TABLE
  4. postgres=# create table tb12(id integer,info character varying);
  5. CREATE TABLE
②、插入数据
  1. postgres=# insert into tb11 select generate_series(1,100),'john' from generate_series(1,10);
  2. INSERT 0 1000
  3. postgres=# insert into tb12 select 1 from generate_series(1,25);
  4. INSERT 0 25
③、查看执行计划
  1. postgres=# explain (analyze,verbose,buffers) select * from tb11,tb12 where tb11.id=tb12.id and tb12.id=1;
  2. QUERY PLAN
  3. ---------------------------------------------------------------------------------------------------------------
  4. Nested Loop (cost=0.00..34.12 rows=250 width=45) (actual time=0.022..0.306 rows=250 loops=1)
  5. Output: tb11.id,tb11.info,tb12.id,tb12.info
  6. Buffers: shared hit=16
  7. -> Seq Scan on public.tb11 (cost=0.00..18.50 rows=10 width=9) (actual time=0.016..0.155 rows=10 loops=1)
  8. Output: tb11.id,tb11.info
  9. Filter: (tb11.id = 1)
  10. Rows Removed by Filter: 990
  11. Buffers: shared hit=6
  12. -> Seq Scan on public.tb12 (cost=0.00..1.31 rows=25 width=36) (actual time=0.001..0.007 rows=25 loops=10)
  13. Output: tb12.id,tb12.info
  14. Filter: (tb12.id = 1)
  15. Buffers: shared hit=10
  16. Total runtime: 0.361 ms
  17. (13 rows)
执行计划的成本:34.12=18.5+10*1.31+联结的成本=31.60+联结的成本
实际的时间成本:0.306=0.155+10*0.007+联结的时间=0.255+联结的时间
0.306=0.155+0.007+联结的时间=0.162+联结的时间
看比值:18.5/0.155=119.354
34.12/0.306=111.503 比值差不多,看来实际的时间跟执行计划的成本一样,都是以循环来计算的。
因为自己的执行计划的实际时间还是按照循环次数来算的,借用德哥的教程中的执行计划来说明。
  1. digoal=# explain (analyze,buffers,timing,costs) select * from n1,n2 where n1.id=n2.id and n2.id=1;
  2. QUERY PLAN
  3. ------------------------------------------------------------------------------------------------------------
  4. Nested Loop (cost=0.00..437.00 rows=250 width=45) (actual time=0.042..0.177 rows=10 loops=1) -- 计算嵌套循环总成本时,按照25次循环计算. 所以比实际成本大很多. 实际时间0.177仅仅比0.16+0.01略大. 因为实际只循环一次.
  5. Output: n1.id,n1.info,n2.id,n2.info
  6. Buffers: shared hit=3
  7. -> Seq Scan on public.n2 (cost=0.00..72.00 rows=25 width=36) (actual time=0.010..0.010 rows=1 loops=1) -- 评估25行,实际1
  8. Output: n2.id,n2.info
  9. Filter: (n2.id = 1)
  10. Buffers: shared hit=1
  11. -> Seq Scan on public.n1 (cost=0.00..14.50 rows=10 width=9) (actual time=0.027..0.160 rows=10 loops=1) -- 评估需要循环25次,实际循环1
  12. Output: n1.id,n1.info
  13. Filter: (n1.id = 1)
  14. Rows Removed by Filter: 990
  15. Buffers: shared hit=2
  16. Total runtime: 0.214 ms
  17. (13 rows)
2. 合并连接,总成本远小于子节点的成本和. 因为任何一个表都只需要扫描到匹配结束.
①、建表
  1. postgres=# create table index_t1(id serial primary key,name character varying);
  2. CREATE TABLE
  3. postgres=# create table index_t2(id serial primary key,name character varying);
  4. CREATE TABLE
②、插入数据
  1. postgres=# insert into index_t1 select generate_series(1,10),'john';
  2. INSERT 0 10
  3. postgres=# insert into index_t2 select generate_series(1,100000),'john';
  4. INSERT 0 100000
③、analyze两个表
  1. postgres=# analyze index_t1;
  2. ANALYZE
  3. postgres=# analyze index_t2;
  4. ANALYZE
④、查看执行计划
  1. postgres=# explain (analyze,buffers) select * from index_t1 t1,index_t2 t2 where t1.id=t2.id;
  2. QUERY PLAN
  3. -------------------------------------------------------------------------------------------------------------------------------------------------
  4. Merge Join (cost=1.56..2.01 rows=10 width=18) (actual time=0.018..0.026 rows=10 loops=1)
  5. Output: t1.id,t1.name,t2.id,t2.name
  6. Merge Cond: (t2.id = t1.id)
  7. Buffers: shared hit=4
  8. -> Index Scan using index_t2_pkey on public.index_t2 t2 (cost=0.29..3148.29 rows=100000 width=9) (actual time=0.006..0.009 rows=11 loops=1)
  9. Output: t2.id,t2.name
  10. Buffers: shared hit=3
  11. -> Sort (cost=1.27..1.29 rows=10 width=9) (actual time=0.008..0.008 rows=10 loops=1)
  12. Output: t1.id,t1.name
  13. Sort Key: t1.id
  14. Sort Method: quicksort Memory: 25kB
  15. Buffers: shared hit=1
  16. -> Seq Scan on public.index_t1 t1 (cost=0.00..1.10 rows=10 width=9) (actual time=0.001..0.002 rows=10 loops=1)
  17. Output: t1.id,t1.name
  18. Buffers: shared hit=1
  19. Total runtime: 0.050 ms
  20. (16 rows)
按照最前面说的成本的计算方法
  1. postgres=# select 1.29+4148.29*(10/100000);
  2. ?column?
  3. ----------
  4. 1.29
  5. (1 row)

合并的成本=2.01-1.29=0.72。

来自http://www.postgresql.org/docs/current/static/using-explain.html的原话:

Merge joins also have measurement artifacts that can confuse the unwary. A merge join will stop reading one input if it's exhausted the other input and the next key value in the one input is greater than the last key value of the other input; in such a case there can be no more matches and so no need to scan the rest of the first input. This results in not reading all of one child,with results like those mentioned for LIMIT. Also,if the outer (first) child contains rows with duplicate key values,the inner (second) child is backed up and rescanned for the portion of its rows matching that key value. EXPLAIN ANALYZE counts these repeated emissions of the same inner rows as if they were real additional rows. When there are many outer duplicates,the reported actual row count for the inner child plan node can be significantly larger than the number of rows that are actually in the inner relation.

3. 使用LIMIT限制时,总成本也将小于节点成本.
  1. postgres=# explain (analyze,buffers)select * from tb11 limit 1;
  2. QUERY PLAN
  3. ---------------------------------------------------------------------------------------------------------------
  4. Limit (cost=0.00..0.02 rows=1 width=9) (actual time=0.013..0.013 rows=1 loops=1)
  5. Output: id,info
  6. Buffers: shared hit=1
  7. -> Seq Scan on public.tb11 (cost=0.00..62.00 rows=4000 width=9) (actual time=0.012..0.012 rows=1 loops=1)
  8. Output: id,info
  9. Buffers: shared hit=1
  10. Total runtime: 0.031 ms
  11. (7 rows)
  12.  
  13.  
  14. Time: 0.411 ms

总成本 0.02=62.00*1/4000+limit操作的成本=0.0155+limit操作的成本。

猜你在找的Postgre SQL相关文章