PostgreSQL SQL优化一列

开发咨询一个业务统计，场景类似于统计一个学生成绩表总成绩的第一名，有可能有多个值。提交上来的sql看了下性能较差，优化如下。

一、环境
Postgresql 9.2.4
OS CentOS 6.3

二、业务场景模拟
1.数据准备

CREATE TABLE tbl_student_record(
  stu_name varchar,stu_record int,stu_type int2,create_date date
);
INSERT INTO tbl_student_record VALUES('zhangsan',80,1,'2014/02/26');
INSERT INTO tbl_student_record VALUES('zhangsan',90,100,'2014/02/26');
INSERT INTO tbl_student_record VALUES('sili',95,'2014/02/26');
INSERT INTO tbl_student_record VALUES('wangwu',75,35,55,'2014/02/26');
INSERT INTO tbl_student_record VALUES('zhaoliu',70,'2014/02/26');

--需要统计出的结果是

stu_name | nn  
----------+-----
 zhangsan | 270
 sili     | 270
(2 rows)

2.优化前
因为没有另外的冗余表提供学生的总成绩，故一般的思路是先计算出第一名的总分是多少然后根据这个总分再去匹配，或者能提前知道总分然后按排序取其最大的值

优化前的脚本:

kenyon=# explain select stu_name,sum(stu_record)  nn from tbl_student_record d 
           group by stu_name having sum(stu_record)= (select t.n from(
              select b.stu_name,sum(b.stu_record)as n from tb l_studen t_record b 
                group by stu_name order by n desc)as t limit 1);
                                               QUERY PLAN                                                
--------------------------------------------------------------------------------------------
 HashAggregate  (cost=66.38..68.88 rows=200 width=36)
   Filter: (sum(d.stu_record) = $0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=36.59..36.61 rows=1 width=8)
           ->  Subquery Scan on t  (cost=36.59..39.09 rows=200 width=8)
                 ->  Sort  (cost=36.59..37.09 rows=200 width=36)
                       Sort Key: (sum(b.stu_record))
                       ->  HashAggregate  (cost=26.95..28.95 rows=200 width=36)
                             ->  Seq Scan on tbl_student_record b  (cost=0.00..21.30 rows=1130 width=36)
   ->  Seq Scan on tbl_student_record d  (cost=0.00..21.30 rows=1130 width=36)
(10 rows)

3.优化后

--使用window function

kenyon=# explain select * from (with tmp as(
       select stu_name,sum(stu_record) as sum_score from tbl_student_record group by stu_name
       select rank() OVER ( order by tmp.sum_score desc) as cc,* from tmp) as tc where tc.cc = 1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Subquery Scan on tc  (cost=40.59..46.59 rows=1 width=48)
   Filter: (tc.cc = 1)
   ->  WindowAgg  (cost=40.59..44.09 rows=200 width=40)
         CTE tmp
           ->  HashAggregate  (cost=26.95..28.95 rows=200 width=36)
                 ->  Seq Scan on tbl_student_record  (cost=0.00..21.30 rows=1130 width=36)
         ->  Sort  (cost=11.64..12.14 rows=200 width=40)
               Sort Key: tmp.sum_score
               ->  CTE Scan on tmp  (cost=0.00..4.00 rows=200 width=40)
(9 rows)

三、分析

相比较之前的sql其COST值能下降下来，是因为少了一次group by分组统计，如果数据量较大其sql 性能差异更明显；当然对这种事实数据整理到一张冗余表统计其实对查询效率提升最大，可以使用物化视图或者定期程序更新。

四、参考 http://my.oschina.net/Kenyon/blog/79543

PostgreSQL SQL优化一列

猜你在找的Postgre SQL相关文章