我正在使用Postgresql 9.1查询层次结构树结构数据,其中包含与节点连接的边(或元素).数据实际上是用于流网络,但是我将问题提炼为简单的数据类型.考虑示例树表.每个边缘具有长度和面积属性,用于确定网络中的一些有用的度量.
CREATE TEMP TABLE tree ( edge text PRIMARY KEY,from_node integer UNIQUE NOT NULL,-- can also act as PK to_node integer REFERENCES tree (from_node),mode character varying(5),-- redundant,but illustrative length numeric NOT NULL,area numeric NOT NULL,fwd_path text[],-- optional ordered sequence,useful for debugging fwd_search_depth integer,fwd_length numeric,rev_path text[],-- optional unordered set,useful for debugging rev_search_depth integer,rev_length numeric,rev_area numeric ); CREATE INDEX ON tree (to_node); INSERT INTO tree(edge,from_node,to_node,mode,length,area) VALUES ('A',1,4,'start',1.1,0.9),('B',2,1.2,1.3),('C',3,5,1.8,2.4),('D',NULL,('E','end',0.9);
下面可以说明,A-E所表示的边缘与节点1-5连接在一起. NULL to_node(Ø)表示结束节点. from_node始终是唯一的,因此它可以作为PK.如果这个网络像drainage basin一样流动,那么从顶部到底部,起始支路边缘是A,B,C,结束流出边缘是E.
documentation for WITH
提供了一个很好的例子,说明如何在递归查询中使用搜索图表.所以,要获得“转发”信息,查询从最后开始,向后退回:
WITH RECURSIVE search_graph AS ( -- Begin at ending nodes SELECT E.from_node,1 AS search_depth,E.length,ARRAY[E.edge] AS path -- optional FROM tree E WHERE E.to_node IS NULL UNION ALL -- Accumulate each edge,working backwards (upstream) SELECT o.from_node,sg.search_depth + 1,sg.length + o.length,o.edge|| sg.path -- optional FROM tree o,search_graph sg WHERE o.to_node = sg.from_node ) UPDATE tree SET fwd_path = sg.path,fwd_search_depth = sg.search_depth,fwd_length = sg.length FROM search_graph AS sg WHERE sg.from_node = tree.from_node; SELECT edge,fwd_path,fwd_search_depth,fwd_length FROM tree ORDER BY edge; edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length ------+-----------+---------+----------+------------------+------------ A | 1 | 4 | {A,D,E} | 3 | 3.4 B | 2 | 4 | {B,E} | 3 | 3.5 C | 3 | 5 | {C,E} | 2 | 2.9 D | 4 | 5 | {D,E} | 2 | 2.3 E | 5 | | {E} | 1 | 1.1
以上是有意义的,适合大型网络.例如,我可以看到边缘B是从末端的3个边,而前向路径是{B,E},从顶端到末端的总长度为3.5.
但是,我无法找出构建反向查询的好办法.也就是说,从每个边缘,累积的“上游”边缘,长度和面积是多少.使用WITH RECURSIVE,我最好的是:
WITH RECURSIVE search_graph AS ( -- Begin at starting nodes SELECT S.from_node,S.to_node,S.length,S.area,ARRAY[S.edge] AS path -- optional FROM tree S WHERE from_node IN ( -- Starting nodes have a from_node without any to_node SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree) UNION ALL -- Accumulate edges,working forwards SELECT c.from_node,c.to_node,sg.length + c.length,sg.area + c.area,c.edge || sg.path -- optional FROM tree c,search_graph sg WHERE c.from_node = sg.to_node ) UPDATE tree SET rev_path = sg.path,rev_search_depth = sg.search_depth,rev_length = sg.length,rev_area = sg.area FROM search_graph AS sg WHERE sg.from_node = tree.from_node; SELECT edge,rev_path,rev_search_depth,rev_length,rev_area FROM tree ORDER BY edge; edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area ------+-----------+---------+----------+------------------+------------+---------- A | 1 | 4 | {A} | 1 | 1.1 | 0.9 B | 2 | 4 | {B} | 1 | 1.2 | 1.3 C | 3 | 5 | {C} | 1 | 1.8 | 2.4 D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2 E | 5 | | {E,C} | 2 | 2.9 | 3.3
我想将聚合构建到递归查询的第二个术语中,因为每个下游边缘连接到1个或多个上游边缘,但是aggregates are not allowed with recursive queries.另外,我知道连接是粗糙的,因为递归结果具有多个连接边缘条件
反向/反向查询的预期结果是:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area ------+-----------+---------+-------------+------------------+------------+---------- A | 1 | 4 | {A} | 1 | 1.1 | 0.9 B | 2 | 4 | {B} | 1 | 1.2 | 1.3 C | 3 | 5 | {C} | 1 | 1.8 | 2.4 D | 4 | 5 | {A,D} | 3 | 3.5 | 3.5 E | 5 | | {A,E} | 5 | 6.4 | 6.8
如何写这个更新查询?
请注意,我最终更关心累积准确的长度和面积总计,路径属性用于调试.在我现实世界的情况下,转发路径最多可达二百,而我预计大型和复杂流域的反向路径将达数万.
更新2:
我重写了原始递归查询,以便所有累加/聚合都在递归部分之外完成.它应该比以前的这个答案更好.
对于类似的问题,来自@a_horse_with_no_name的 answer非常相似.
我重写了原始递归查询,以便所有累加/聚合都在递归部分之外完成.它应该比以前的这个答案更好.
对于类似的问题,来自@a_horse_with_no_name的 answer非常相似.
WITH RECURSIVE search_graph(edge,area,start_node) AS ( SELECT edge,from_node AS "start_node" FROM tree UNION ALL SELECT o.edge,o.from_node,o.to_node,o.length,o.area,p.start_node FROM tree o JOIN search_graph p ON p.from_node = o.to_node ) SELECT array_agg(edge) AS "edges" --,array_agg(from_node) AS "nodes",count(edge) AS "edge_count",sum(length) AS "length_sum",sum(area) AS "area_sum" FROM search_graph GROUP BY start_node ORDER BY start_node ;
结果如预期:
start_node | edges | edge_count | length_sum | area_sum ------------+-------------+------------+------------+------------ 1 | {A} | 1 | 1.1 | 0.9 2 | {B} | 1 | 1.2 | 1.3 3 | {C} | 1 | 1.8 | 2.4 4 | {D,A} | 3 | 3.5 | 3.5 5 | {E,A} | 5 | 6.4 | 6.8