当我在两个用户之间显示一个对话(一组消息)时,我希望能够通过user_id对消息进行分组,但这是一种棘手的方式:
假设有一些消息(按created_at desc排序):
id: 1,user_id: 1 id: 2,user_id: 1 id: 3,user_id: 2 id: 4,user_id: 2 id: 5,user_id: 1
我想按以下顺序获得3个消息组:
[1,2],[3,4],[5]
它应该按* user_id *分组,直到它看到另一个,然后按那个分组.
解决方法
@Igor提供了一个很好的带有窗口函数的纯sql技术.
然而:
I want to get 3 message groups in the below order: [1,[5]
SELECT array_agg(id) AS ids FROM ( SELECT id,user_id,row_number() OVER (ORDER BY id) - row_number() OVER (PARTITION BY user_id ORDER BY id) AS grp FROM messages ORDER BY id) t -- for ordered arrays in result GROUP BY grp,user_id ORDER BY min(id);
这一补充几乎无法得到另一个答案.更重要的问题是:
使用PL / pgsql更快
I’m using Postgresql and would be happy to use something specific to it,whatever would give the best performance.
纯sql既漂亮又闪亮,但程序服务器端功能对于此任务来说要快得多.虽然在程序上处理行通常较慢,但plpgsql赢得此竞争的重要时间,因为它可以使用单个表扫描和单个ORDER BY操作:
CREATE OR REPLACE FUNCTION f_msg_groups() RETURNS TABLE (ids int[]) AS $func$ DECLARE _id int; _uid int; _id0 int; -- id of last row _uid0 int; -- user_id of last row BEGIN FOR _id,_uid IN SELECT id,user_id FROM messages ORDER BY id LOOP IF _uid <> _uid0 THEN RETURN QUERY VALUES (ids); -- output row (never happens after 1 row) ids := ARRAY[_id]; -- start new array ELSE ids := ids || _id; -- add to array END IF; _id0 := _id; _uid0 := _uid; -- remember last row END LOOP; RETURN QUERY VALUES (ids); -- output last iteration END $func$LANGUAGE plpgsql;
呼叫:
SELECT * FROM f_msg_groups();
基准和链接
我在一个类似60k行的真实生活表上运行了EXPLAIN ANALYZE的快速测试(执行几次,选择最快结果以排除兑现效果):
sql:
总运行时间:1009.549毫秒
PL / pgsql的:
总运行时间:336.971 ms
还要考虑这些密切相关的问题:
> GROUP BY and aggregate sequential numeric values
> GROUP BY consecutive dates delimited by gaps
> Ordered count of consecutive repeats / duplicates