Postgresql的窗口函数使用 文档中涉及的表的结构和数据: 1.表emp_detail: create table emp_detail( empno integer,ename varchar(10),sal numeric,dept_no integer,time_stamp date ); insert into emp_detail values(7369,'SMITH',100); insert into emp_detail values(7369,100,20,'2015-04-01'); insert into emp_detail values(7369,105,'2015-04-02'); insert into emp_detail values(7369,120,'2015-04-03'); insert into emp_detail values(7369,150,'2015-04-04'); insert into emp_detail values(7369,200,'2015-04-05'); insert into emp_detail values(7369,400,'2015-04-06'); insert into emp_detail values(7369,180,'2015-04-07'); 2.表Student: create table student( id int,stu_name varchar(50),chinese numeric,english numeric,math numeric ); insert into student values(1001,'小明',80,75,90); insert into student values(1002,'小红',70,85); insert into student values(1003,'小强',90,100); 3.表emp: CREATE TABLE public.emp ( empno INTEGER,ename VARCHAR(10),job VARCHAR(9),mgr INTEGER,hiredate TIMESTAMP(6) WITHOUT TIME ZONE,sal DOUBLE PRECISION,comm DOUBLE PRECISION,dept_no INTEGER ); INSERT INTO public.emp ("empno","ename","job","mgr","hiredate","sal","comm","dept_no") VALUES (7369,E'SMITH',E'CLERK',7902,E'1980-12-17 00:00:00',800,NULL,20); INSERT INTO public.emp ("empno","dept_no") VALUES (7499,E'ALLEN',E'SALESMAN',7698,E'1981-02-20 00:00:00',1600,306,30); INSERT INTO public.emp ("empno","dept_no") VALUES (7521,E'WARD',E'1981-02-22 00:00:00',1250,506,"dept_no") VALUES (7566,E'JONES',E'MANAGER',7839,E'1981-04-02 00:00:00',2975,"dept_no") VALUES (7654,E'MARTIN',E'1981-09-28 00:00:00',1406,"dept_no") VALUES (7698,E'BLAKE',E'1981-05-01 00:00:00',2850,"dept_no") VALUES (7782,E'CLARK',E'1981-06-09 00:00:00',2450,10); INSERT INTO public.emp ("empno","dept_no") VALUES (7788,E'SCOTT',E'ANALYST',7566,E'1987-04-19 00:00:00',3000,"dept_no") VALUES (7839,E'KING',E'PRESIDENT',E'1981-11-17 00:00:00',5000,"dept_no") VALUES (7844,E'TURNER',E'1981-09-08 00:00:00',1500,6,"dept_no") VALUES (7876,E'ADAMS',7788,E'1987-05-23 00:00:00',1100,"dept_no") VALUES (7900,E'JAMES',E'1981-12-03 00:00:00',950,"dept_no") VALUES (7902,E'FORD',"dept_no") VALUES (7934,E'MILLER',7782,E'1982-01-23 00:00:00',1300,10); 一、窗口函数的语法 窗口函数的基本结构: function_name ([expression [,expression ... ]]) OVER window_name function_name ([expression [,expression ... ]]) OVER ( window_definition ) function_name ( * ) OVER window_name function_name ( * ) OVER ( window_definition ) window_definition 的定义: [ existing_window_name ] [ PARTITION BY expression [,...] ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [,...] ] [ frame_clause ] 配置项frame_clause的选择: [ RANGE | ROWS ] frame_start [ RANGE | ROWS ] BETWEEN frame_start AND frame_end 窗口起始项 frame_start 和结束项 frame_end的选择: UNBOUNDED PRECEDING value PRECEDING CURRENT ROW value FOLLOWING UNBOUNDED FOLLOWING 这里, expression代表着任何的值表达式,但是不包含自身的窗口函数调用。 Window_name代表一个窗口名称,一个完整的窗口是用WINDOW关键字命名并且定义的,也可以使匿名的。 PARTITION BY 与group by是类似的都是用来分组,但是与group by不同的是PARTITION BY中就只是一个表达式,不像groupby其除了分组字段以外不能再select中显示,除非是聚合函数。如果没有PARTITION BY,那么每一行就是一个分组一个窗口。 ORDER BY作用是使PARTITION BY分组中的排序方式,支持与通常的排序是一致的可以有ASC,DESC,nulls first or last等等。如果没有指定order by那么就是没有指定排序方式。 frame_clause就是指窗口的大小,窗口如何移动等。 frame_start和frame_end就是来确定窗口大小的两个参数或者是窗口的边界。 frame_start不能使用unbounded following,同样frame_end也不能使用unbounded preceding UNBOUNDED PRECEDING 的意思是从第一行开始,但是只能用在frame_start的位置。 UNBOUNDED FOLLOWING 的意思更好与UNBOUNDED PRECEDING相反指的是到最后一行,其也只能用在frame_end的位置。 value PRECEDING和 value FOLLOWING都只能使用咋rows模式中不能使用在range模式中,value的值是一个整型的数值也可以使整型表达式,不能为变量,聚合函数,窗口函数等,value不能为空或者是不明确的,但是可以为0,为0的时候表示的就是当前行。 value PRECEDING是指从哪一行开始,value FOLLOWING指的是从哪一行结束。 Value = 1 时 value PRECEDING 指的是当前行的前一行开始,value FOLLOWING则为当前行的前一行为止。随着表中数据的扫描窗口会以这个尺寸一直走下去,执行相关的分析函数。 二、窗口函数示例: 1.从员工表(emp)中查询每个员工的信息,并且查询整个公司的工资总额。 select ename,sal,sum(sal)over(order by empno range between unbounded preceding and unbounded following) from emp; 2.从员工明细表中查询一个员工在前后三天所得的工资总和。 select empno,ename,dept_no,sum(sal) over( order by empno,time_stamp rows between 1 preceding and 1 following) from emp_detail; 3.从员工表(emp)中查询每个员工的信息,并且查询每个部门的工资总和。 select ename,sum(sal) over(partition by dept_no order by empno range between unbounded preceding and unbounded following) from emp; 三、分析函数的介绍 row_number():在一个结果集中,返回当前的行的号码。 rank()、dense_rank():在一个结果集中,用来排名,前者是完全差异后者是不完全差异,简言之前者是按阿拉伯数字顺序来,后者则会跳跃。 lag(value any)、lead(value any):用来对当前行对于指定的字段与下一行或者前一行的值进行比较。 first_value(value any)、last_value(value any):在一个窗口中,返回指定排序的第一个值和最后一个值。 其他类似与sum(),agv(),max(),min()也都是能与窗口函数配合使用,当做分析函数。 四、分析函数与窗口函数的混用示例 1.从员工表(emp)中按照员工被雇佣的时间大小,查询入职时间的先后顺序。 Select row_number() over( order by hiredate asc),empno,hiredate from emp; 2.从员工表中查询每个部门的工资排名,并且给工资最高的人加10%的奖金。 update emp set comm = comm + sal * 0.01 where empno in ( select * from ( select ename,dense_rank() over(partition by dept_no order by sal desc) as level_ from emp ) t where level_ = 1 ); 3.从员工表中查询每个部门的工资排名,并且给每个部门中工资排名在第三名的员工加20%的奖金。 update emp set comm = comm + sal * 0.02 where empno in ( select * from ( select ename,rank() over(partition by dept_no order by sal desc) as level_ from emp ) t where level_ = 1 ); 4.从员工表中查询每个部门的员工的工资从大到小排序,并且计算前后两名的工资差值。 select ename,lag(sal)over(partition by dept_no order by sal) as lag_end,sal - lag(sal)over(partition by dept_no order by sal) from emp order by dept_no,sal asc; 查询结果中存在null中,原因是每个分组的第一行没有前一行一次为空值。 那么假定第一行的值需要与最后一行来比较,那么应该怎么做: select ename,CASE when lag(sal)over(partition by dept_no order by sal) is null then max(sal)OVER(partition by dept_no order by sal desc) else lag(sal)over(partition by dept_no order by sal) end lag_end from emp order by dept_no,sal asc; 也可以是这样: select ename,CASE when lag(sal)over(partition by dept_no order by sal) is null then first_value(sal)OVER(partition by dept_no order by sal desc) else lag(sal)over(partition by dept_no order by sal) end lag_end from emp order by dept_no,sal asc; 5.查询学生表中每一个学生按照科目的分数排序。 select id,stu_name,course,point_,first_value(point_)over(partition by id order by point_ desc ) from ( with temp as ( select id,chinese,english,math from student ),t1 as (select id,chinese as point_,'语文'::text as course from temp ),t2 as (select id,english as point_,'英语'::text as course from temp ),t3 as (select id,math as point_,'数学'::text as course from temp) select * from t1 union all (select * from t2) union all (select * from t3)