想象一下,这里有3家公司.我们按姓名加入表格,因为并非每位员工都提供了他的PersonalNo. StringId只有专家,所以它也不能用于加入.同一名员工可以在多家公司工作.
问题
问题是可能存在具有相同名称的不同员工(具有相同的名字和姓氏,在示例中仅提供名字).
我需要的?
当数据有问题时返回1,如果正确则返回0.
检测问题的规则
>当有多个相同的名字(2个或更多)并且所有人都具有相同的PersonalNo而不是所有人都有StringId(如彼得)应该返回1(这是错误的)
>当有多个相同的名称(2个或更多)并且有NULL(参见John)时,但它们都具有相同的StringId它应该返回0(这是正确的,这意味着其中一个公司没有提供PersonalNo)
>当有多个相等的名称(2个或更多)并且所有PersonalNo相等且所有StringId都是equall时(参见Lisa)它应该返回0(正确)
>当有多个相同的名字(2个或更多)并且有多个不同的PersonalNo和所有StringId提供它应该是:我们看到这里有2个不同的人Jennifer与4805250141 PersonalNo和Jennifer与4920225088 PersonalNo,Jennifer与NULL PersonalNo有与Jennifer一样的StringId和4920225088 PersonalNo so它应该返回0(正确)并且不应该选择带有4805250141 PersonalNo的Jennifer,因为具有StringID并且只有1行具有相同的PersonalNo.
>如果只有1行并且没有提供StringId它根本不应出现在select中.
样本数据
Company Name PersonalNo StringId Comp1 Peter 3850342515 85426 ------------------------------------------------------------------- Comp2 Peter 3850342515 '' -- If have the same PersonalNo and there is no StringId - 1 (wrong) Comp1 John NULL 12345 ------------------------------------------------------------------ Comp2 John 3952525252 12345 -- If have the same StringId and 1 PersonalNo is NULL - 0 (correct) Comp1 Lisa 4951212581 52124 ---------------------------------------------------------------- Comp3 Lisa 4951212581 52124 -- If PersonalNo are equal and StringId are equal - 0 (correct) Comp1 Jennifer 4805250141 '' ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Comp1 Jennifer 4920225088 55443 -- If have 2 different PersonalNo and NULL PersonalNo,but where PersonalNo is NULL Comp3 Jennifer NULL 55443 -- Have the same StringId with other row where is provided PersonalNo it should be 0 (correct),with different PersonalNo where is no StringId shouldn't appear at all. Comp1 Ralph 3961212256 '' -- Shouldn't appear in select list,because only 1 row with this PersonalNo and there is no StringID
期望的输出
Peter 1 John 0 Lisa 0 Jennifer 0
QUERY
LEFT JOIN (SELECT Name,( SELECT CASE WHEN MIN(PersonalNo) <> MAX(d.PersonalNo) and MIN(CASE WHEN StringId IS NULL THEN '0' ELSE StringId END) <> MAX(CASE WHEN d.StringId IS NULL THEN '0' ELSE d.StringId END) -- this is wrong and MIN(PersonalNo) <> '' and MIN(PersonalNo) IS NOT NULL and MAX(rn) > 1 THEN 1 ELSE 0 END AS CheckPersonalNo FROM ( SELECT Name,PersonalNo,[StringId],ROW_NUMBER() OVER (PARTITION BY Name,PersonalNo ORDER BY Name) rn FROM TableEmp e1 WHERE Condition = 1 and e1.Name = d.Name ) sub2 GROUP BY Name ) CheckPersonalNo FROM [TableEmp] d WHERE Condition = 1 GROUP BY Name ) f ON f.Name = x.Name
查询的问题是我只能按名称分组,不能将PersonalNo添加到GROUP BY子句,所以我需要在选择列表中使用聚合.但是现在它仅比较MIN和MAX值,如果有超过2行具有相同的名称它没有按预期工作.
我需要做类似的事情,比较PARTITION BY Fullname,PersonalNo的值.它现在比较具有相同名称的值(不依赖于PersonalNo).
有任何想法吗?如果您有任何问题 – 请问我,我会尽力解释.
更新1
如果有2个条目具有不同的PersonalNo,但它们的StringId相等,则应为1(错误).
Company Name PersonalNo StringId Comp1 Anna 4805250141 88552 -- different PersonalNo and the same StringId for both should go as 1 (wrong) Comp1 Anna 4920225088 88552
现在回来像:
Anna 0 Anna 0
它应该是:
Anna 1
更新2
在Identifier列中UNION更新后返回StringId:55443(对于下面的数据),但在这种情况下,当1个条目具有PersonalNo时,其他为空,但它们都具有相同(相等)StringId它是正确的(应为0)
Comp1 Jennifer 4920225088 55443 Comp3 Jennifer '' 55443
解决方法
可能有其他方法可以做到这一点,但个人我可能会使用临时表进行临时工作,如果是我这样做..
--select data into a temp table that can be modified select * into #cleaned from table --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null --find all records with non matching string ids select name,count(*) as numIDs into #issues from( select name,stringID from #cleaned group by name,stringID ) as i group by name,PersonalNo having count(*) > 1 --select data for viewing. select distinct s.name,case when i.name is not null then 1 else 0 end as issue from #cleaned as s left outer join #issues as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc
sqlFiddle:http://sqlfiddle.com/#!3/f4aab/7
抱歉,如果这里有虫子,但我相信你会得到这个想法,它不是火箭科学,只是另一种方法
编辑:刚刚注意到你对没有字符串ID的行感兴趣..只是如果它是唯一的行,那么它不是问题.我修改了第一个select(into #cleaned)以获取所有行.
编辑:没有临时表现在你知道它在做什么,这里是没有任何临时表的相同的东西 – 但警告这更新源表分配丢失的personalNo的
update c set c.personalNo = s.personalNo from table1 as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null select distinct s.name,case when i.name is not null then 1 else 0 end as issue from table1 as s left outer join ( select name,count(*) as numIDs from( select name,stringID from table1 group by name,stringID ) as i group by name,PersonalNo having count(*) > 1 ) as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc
sqlFiddle:http://sqlfiddle.com/#!3/f4aab/8
分区我没看到我将如何在这里使用分区,因为你想要做的只是知道是否有多行,我使用更复杂的制表分区,或者如果我要对更新数据的判断调用的结果进行排名基于更复杂的规则..但无论如何这里是一个禁止分区的乌鸦:D
Select name,personalNo,case when numstrings > 1 then 1 else 0 end as issue from (select name,row_number() over (partition by name,personalNo order by name,stringID ) as numstrings from #cleaned group by name,stringid) as d order by issue desc
注意:这使用了如上所述的#cleaned表,因为我认为这使得难以解决的问题的关键是有时候缺少的个人诺.
没有临时表,没有更新
使用上面显然可以不使用任何临时表或更新任何东西,它只是一个可读性/可维护性的问题,以及它是否实际上甚至更快.这可以更稳定地处理具有多个personalNo分配的字符串ID:
select distinct s.name,count(*) as numIDs from( select a.name,coalesce(a.PersonalNo,b.PersonalNo) as PersonalNo,a.stringID from table1 as a left outer join table1 as b on a.name = b.name and a.stringid=b.stringid and a.personalNo != b.personalNo and b.personalNo Is Not Null group by a.name,a.PersonalNo,a.stringID,b.PersonalNo ) as i group by name,PersonalNo having count(*) > 1 ) as i on s.name = i.name and s.personalNo = i.personalNo order by issue desc
sqlFiddle:http://sqlfiddle.com/#!3/f4aab/9
编辑:寻找不一致的个人数字 – 这使用一个临时表,但你可以像上一个例子中所做的那样交换它.注意你要求的原始结构略有偏差,因为这更像是我会这样做任务,但这里有足够的代码供您重新进行任何您想要的方式.
--select data into a temp table that can be modified select * into #cleaned from table1 --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and c.personalNo is null and s.personalNo is not null Select IssueType,Name,Identifier from ( --find all records with non matching PersonalNos select name,cast('StringID: ' + stringID as nvarchar(400)) as Identifier,cast('Inconsistent PersonalNo' as nvarchar(400)) as issueType from( select name,stringID from #cleaned group by name,stringID ) as i group by name,StringId having count(*) > 1 UNION --find all records with non matching string ids select name,'PersonalNo: ' + PersonalNo,cast('Inconsistent String ID' as nvarchar(400)) as issueType from( select name,PersonalNo having count(*) > 1 ) as a
sqlFiddle:http://sqlfiddle.com/#!3/e9da2/18
更新:也想接受空字符串personalNo
这是另一个新要求..接受空字符串的方式与personalNo中的NULL相同
--select data into a temp table that can be modified select * into #cleaned from table1 --apply personal numbers based on other records with matching string id --you could take note of the records you are doing this to for data clean up update c set c.personalNo = s.personalNo from #cleaned as c inner join table1 as s on c.name = s.name and c.stringID = s.stringID and (c.personalNo IS NULL OR c.personalNo ='') and s.personalNo is not null and s.personalNo != '' Select IssueType,StringId having count(*) > 1 UNION --find all records with non matching string ids select name,PersonalNo having count(*) > 1 ) as a