顾名思义,签到循环联结就是一个循环嵌套在另一个循环当中。外层循环来说基本就是一个只使用WHERE子句中属于驱动表的条件对它进行的查询。当数据行经过了外层条件 筛选并被确认匹配条件后,这些行就会逐个进入到内层循环中。然后再基于联结列进行逐行检查看是否与被联结的表中的某一行相匹配。如果这一行与第二次的检查相匹配,就将会被传递到检查计划的下一步或者如果没有更多步骤的话直接被包含在最终的结果集中。
这种类型的联结的强大之处在于所使用的内存是非常少的。因为数据行集一次只加工一行,所需的开支也是非常小的。由于这个原因,除了使用一次加工一行这种方式来建立一个很大的数据集需要较长的时间这一点以外,它也是适合进行行大数据集加工的。这就是为什么前面提过的嵌套循环联结在结果集较小的时候是最好的。嵌套循环联结的基本度量就是为了准备最终结果集所需要访问的数据块数目。
1. 建表
postgres=# create table class(id serial primary key,class_name character varying); CREATE TABLE postgres=# postgres=# create table student(id serial primary key,name character varying,class_id integer); CREATE TABLE在student的class_id上创建索引
postgres=# create index idx_student_class_id on student(class_id); CREATE INDEX
postgres=# \d student Table "public.student" Column | Type | Modifiers ----------+-------------------+------------------------------------------------------ id | integer | not null default nextval('student_id_seq'::regclass) name | character varying | class_id | integer | Indexes: "student_pkey" PRIMARY KEY,btree (id) "idx_student_class_id" btree (class_id) postgres=# postgres=# \d class Table "public.class" Column | Type | Modifiers ------------+-------------------+---------------------------------------------------- id | integer | not null default nextval('class_id_seq'::regclass) class_name | character varying | Indexes: "class_pkey" PRIMARY KEY,btree (id)2. 插入测试数据:
postgres=# insert into class values(1,'aa'); INSERT 0 1 postgres=# insert into class values(2,'bb'); INSERT 0 1 postgres=# insert into class values(3,'cc'); INSERT 0 1
postgres=# insert into student values(1,'tom',1); INSERT 0 1 postgres=# insert into student values(2,'jim',1); INSERT 0 1 postgres=# insert into student values(3,'jack',2); INSERT 0 1 postgres=# insert into student(name,class_id) select name,class_id from student; INSERT 0 3 postgres=# insert into student(name,class_id from student; INSERT 0 6 postgres=# insert into student(name,class_id from student; INSERT 0 12 postgres=# insert into student(name,class_id from student; INSERT 0 24 postgres=# insert into student(name,class_id from student; INSERT 0 48 postgres=# insert into student(name,class_id from student; INSERT 0 96 postgres=# insert into student(name,class_id from student; INSERT 0 192
postgres=# select count(0) from student; count ------- 384 (1 row)
3. select s.*,c.class_name from student s,class c where s.class_id=c.id;
这个查询将会向下面的伪代码一样来处理结果:for each row in(select * from student) loop
for(select id,class_name from class where id=outer.class_id) loop
if match then pass the row on to the next step
if inner join and no match then discard the row
if outer join and no match then set inner column values to null and pass the row on to the next step
end loop;
end loop;
4. 查看执行计划
①、
postgres=# explain(analyze,verbose,buffers) select s.*,class c where s.class_id=c.id and c.id<=2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.15..125.88 rows=128 width=44) (actual time=0.021..1.076 rows=384 loops=1) Output: s.id,s.name,s.class_id,c.class_name Buffers: shared hit=771 —> Seq Scan on public.student s (cost=0.00..6.84 rows=384 width=12) (actual time=0.010..0.076 rows=384 loops=1) Output: s.id,s.class_id Buffers: shared hit=3 —> Index Scan using class_pkey on public.class c (cost=0.15..0.30 rows=1 width=36) (actual time=0.001..0.002 rows=1 <span style="color:#ff0000;">loops=384</span>) Output: c.id,c.class_name Index Cond: ((c.id = s.class_id) AND (c.id <= 2)) Buffers: shared hit=768 Total runtime: 1.168 ms (11 rows)可以看到内表循环了384次,相当于执行了384次 select c.id,c.class_name from class where c.id=outer.class_id,每次都是索引扫描,在计算的成本的时候当然该句sql的成本就要乘以384,所以总成本125.88=6.64+384*0.30+联结的成本=122.04+联结的成本。
②、
postgres=# explain(analyze,class c where s.class_id=c.id and c.id in(1,2); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=8.32..32.03 rows=1 width=44) (actual time=0.027..0.362 rows=384 loops=1) Output: s.id,c.class_name Join Filter: (s.class_id = c.id) Rows Removed by Join Filter: 384 Buffers: shared hit=6 —> Seq Scan on public.student s (cost=0.00..6.84 rows=384 width=12) (actual time=0.008..0.051 rows=384 loops=1) Output: s.id,s.class_id Buffers: shared hit=3 —> Materialize (cost=8.32..13.67 rows=2 width=36) (actual time=0.000..0.000 rows=2 <span style="color:#ff0000;">loops=384</span>) Output: c.class_name,c.id Buffers: shared hit=3 —> Bitmap Heap Scan on public.class c (cost=8.32..13.66 rows=2 width=36) (actual time=0.012..0.013 rows=2 loops=1) Output: c.class_name,c.id Recheck Cond: (c.id = ANY ('{1,2}'::integer[])) Buffers: shared hit=3 —> Bitmap Index Scan on class_pkey (cost=0.00..8.32 rows=2 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (c.id = ANY ('{1,2}'::integer[])) Buffers: shared hit=2 Total runtime: 0.415 ms (19 rows)Materialize: 物化节点,小于work_mem时全部塞内存,大时需要用到磁盘. 这里虽然是嵌套循环,但是下面的索引扫描只有一次,循环384次是在物化节点,物化节点计算循环成本时,只计算内存操作的成本,故这里的总成本32.03=13.66+6.84+联结的成本=20.50+联结的成本。
对Materialize物化节点的解释:
Q:
What does materialize do? I'm joining two tables,not views or anything like that.
A:
A materialize node means the output of whatever is below it in the tree (which can be a scan,or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.
So in your case,the planner is determining that the result of a scan on one of your tables will fit in memory,and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.