示例表:
ID |Identifier1 | Identifier2 --------------------------------- 1 | a | c 2 | b | f 3 | a | g 4 | c | h 5 | b | j 6 | d | f 7 | e | k 8 | i | 9 | l | h
我想在两列之间对彼此相关的标识符进行分组,并分配唯一的组ID.
期望的输出:
Identifier | Gr_ID | Gr.Members --------------------------------------------------- a | 1 | (a,c,g,h,l) b | 2 | (b,d,f,j) c | 1 | (a,l) d | 2 | (b,j) e | 3 | (e,k) f | 2 | (b,j) g | 1 | (a,l) h | 1 | (a,l) j | 2 | (b,j) k | 3 | (e,k) l | 1 | (a,l) i | 4 | (i)
注意:Gr.Members列不是必需的,主要用于更清晰的视图.
So the definition for a group is: A row belongs to a group if it
shares at least one identifier with at least one row of this group
但是必须将组ID分配给每个标识符(由两列的并集选择)而不是行.
谢谢.
更新:下面是一些额外的样本集及其预期输出.
给定表:
Identifier1 | Identifier2 ---------------------------- a | f a | g a | NULL b | c b | a b | h b | j b | NULL b | NULL b | g c | k c | b d | l d | f d | g d | m d | a d | NULL d | a e | c e | b e | NULL
预期输出:所有记录应属于组ID = 1的同一组.
给定表:
Identifier1 | Identifier2 -------------------------- a | a b | b c | a c | b c | c
预期输出:记录应位于组ID = 1的同一组中.
解决方法
本质上,它将数据视为图中的边,并递归遍历图的所有边,在检测到循环时停止.然后它将所有找到的循环放入组中,并为每个组分配一个数字.
请参阅下面有关其工作原理的详细说明.我建议您运行查询CTE-by-CTE并检查每个中间结果以了解它的作用.
样品1
DECLARE @T TABLE (ID int,Ident1 char(1),Ident2 char(1)); INSERT INTO @T (ID,Ident1,Ident2) VALUES (1,'a','a'),(2,'b','b'),(3,'c',(4,(5,'c');
样本2
我添加了另一行z值,以使多行具有不成对的值.
DECLARE @T TABLE (ID int,(1,'c'),'f'),'g'),'h'),'j'),(6,'d',(7,'e','k'),(8,'i',NULL),(88,'z','z'),(9,'l','h');
样本3
DECLARE @T TABLE (ID int,(10,(11,(12,(13,'l'),(14,(15,(16,'m'),(17,(18,(19,(20,(21,(22,NULL);
询问
WITH CTE_Idents AS ( SELECT Ident1 AS Ident FROM @T UNION SELECT Ident2 AS Ident FROM @T ),CTE_Pairs AS ( SELECT Ident1,Ident2 FROM @T WHERE Ident1 <> Ident2 UNION SELECT Ident2 AS Ident1,Ident1 AS Ident2 FROM @T WHERE Ident1 <> Ident2 ),CTE_Recursive AS ( SELECT CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent,Ident2,CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath,1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1 UNION ALL SELECT CTE_Recursive.AnchorIdent,CTE_Pairs.Ident1,CTE_Pairs.Ident2,CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',CTE_Recursive.Lvl + 1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1 WHERE CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000)) ),CTE_RecursionResult AS ( SELECT AnchorIdent,Ident2 FROM CTE_Recursive ),CTE_CleanResult AS ( SELECT AnchorIdent,Ident1 AS Ident FROM CTE_RecursionResult UNION SELECT AnchorIdent,Ident2 AS Ident FROM CTE_RecursionResult ) SELECT CTE_Idents.Ident,CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers,DENSE_RANK() OVER(ORDER BY CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END ) AS GroupID FROM CTE_Idents CROSS APPLY ( SELECT CTE_CleanResult.Ident+',' FROM CTE_CleanResult WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident ORDER BY CTE_CleanResult.Ident FOR XML PATH(''),TYPE ) AS CA_XML(XML_Value) CROSS APPLY ( SELECT CA_XML.XML_Value.value('.','NVARCHAR(MAX)') ) AS CA_Data(XML_Value) WHERE CTE_Idents.Ident IS NOT NULL ORDER BY Ident;
结果1
+-------+--------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------+---------+ | a | a,b,| 1 | | b | a,| 1 | | c | a,| 1 | +-------+--------------+---------+
结果2
+-------+--------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------+---------+ | a | a,l,| 1 | | b | b,j,| 2 | | c | a,| 1 | | d | b,| 2 | | e | e,k,| 3 | | f | b,| 2 | | g | a,| 1 | | h | a,| 1 | | i | i | 4 | | j | b,| 2 | | k | e,| 3 | | l | a,| 1 | | z | z | 5 | +-------+--------------+---------+
结果3
+-------+--------------------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------------------+---------+ | a | a,e,m,| 1 | | d | a,| 1 | | e | a,| 1 | | f | a,| 1 | | g | a,| 1 | | j | a,| 1 | | k | a,| 1 | | l | a,| 1 | | m | a,| 1 | +-------+--------------------------+---------+
这个怎么运作
我将使用第二组样本数据进行解释.
CTE_Idents
CTE_Idents给出了Ident1和Ident2列中出现的所有标识符的列表.
由于它们可以按任何顺序出现,因此我们将两个列联合起来. UNION还删除任何重复项.
+-------+ | Ident | +-------+ | NULL | | a | | b | | c | | d | | e | | f | | g | | h | | i | | j | | k | | l | | z | +-------+
CTE_Pairs
CTE_Pairs给出了两个方向上图形的所有边的列表.同样,UNION用于删除任何重复项.
+--------+--------+ | Ident1 | Ident2 | +--------+--------+ | a | c | | a | g | | b | f | | b | j | | c | a | | c | h | | d | f | | e | k | | f | b | | f | d | | g | a | | h | c | | h | l | | j | b | | k | e | | l | h | +--------+--------+
CTE_Recursive
CTE_Recursive是查询的主要部分,它从每个唯一标识符开始递归遍历图形.
这些起始行由UNION ALL的第一部分生成.
UNION ALL的第二部分以递归方式连接自身,将Ident2链接到Ident1.
由于我们预先制作了CTE_Pairs,所有边都写在两个方向上,我们总是只能将Ident2链接到Ident1,我们将获得图中的所有路径.
同时,查询构建IdentPath – 一串到目前为止已遍历的逗号分隔的标识符.
它在WHERE过滤器中使用:
CTE_Recursive.IdentPath NOT LIKE CAST('%,%' AS varchar(8000))
一旦我们遇到之前包含在Path中的Identifier,递归就会在连接节点列表耗尽时停止.
AnchorIdent是递归的起始标识符,稍后将用于对结果进行分组.
Lvl并没有真正使用,我将其包含在内以便更好地了解正在发生的事情.
+-------------+--------+--------+-------------+-----+ | AnchorIdent | Ident1 | Ident2 | IdentPath | Lvl | +-------------+--------+--------+-------------+-----+ | a | a | c |,a,| 1 | | a | a | g |,| 1 | | b | b | f |,| 1 | | b | b | j |,| 1 | | c | c | a |,| 1 | | c | c | h |,| 1 | | d | d | f |,| 1 | | e | e | k |,| 1 | | f | f | b |,| 1 | | f | f | d |,| 1 | | g | g | a |,| 1 | | h | h | c |,| 1 | | h | h | l |,| 1 | | j | j | b |,| 1 | | k | k | e |,| 1 | | l | l | h |,| 1 | | l | h | c |,| 2 | | l | c | a |,| 3 | | l | a | g |,| 4 | | j | b | f |,| 2 | | j | f | d |,| 3 | | h | c | a |,| 2 | | h | a | g |,| 3 | | g | a | c |,| 2 | | g | c | h |,| 3 | | g | h | l |,| 4 | | f | b | j |,| 2 | | d | f | b |,| 2 | | d | b | j |,| 3 | | c | h | l |,| 2 | | c | a | g |,| 2 | | b | f | d |,| 2 | | a | c | h |,| 2 | | a | h | l |,| 3 | +-------------+--------+--------+-------------+-----+
CTE_CleanResult
CTE_CleanResult仅从CTE_Recursive中保留相关部分,并再次使用UNION合并Ident1和Ident2.
+-------------+-------+ | AnchorIdent | Ident | +-------------+-------+ | a | a | | a | c | | a | g | | a | h | | a | l | | b | b | | b | d | | b | f | | b | j | | c | a | | c | c | | c | g | | c | h | | c | l | | d | b | | d | d | | d | f | | d | j | | e | e | | e | k | | f | b | | f | d | | f | f | | f | j | | g | a | | g | c | | g | g | | g | h | | g | l | | h | a | | h | c | | h | g | | h | h | | h | l | | j | b | | j | d | | j | f | | j | j | | k | e | | k | k | | l | a | | l | c | | l | g | | l | h | | l | l | +-------------+-------+
最后的选择
现在我们需要为每个AnchorIdent构建一串逗号分隔的Ident值.用FOR XML交叉应用就可以了.DENSE_RANK()计算每个AnchorIdent的GroupID号.