sql – PARTITION BY Name,Id用于比较和检测问题

交代

想象一下,这里有3家公司.我们按姓名加入表格,因为并非每位员工都提供了他的PersonalNo. StringId只有专家,所以它也不能用于加入.同一名员工可以在多家公司工作.

问题

问题是可能存在具有相同名称的不同员工(具有相同的名字和姓氏,在示例中仅提供名字).

我需要的？

当数据有问题时返回1,如果正确则返回0.

检测问题的规则

>当有多个相同的名字(2个或更多)并且所有人都具有相同的PersonalNo而不是所有人都有StringId(如彼得)应该返回1(这是错误的)
>当有多个相同的名称(2个或更多)并且有NULL(参见John)时,但它们都具有相同的StringId它应该返回0(这是正确的,这意味着其中一个公司没有提供PersonalNo)
>当有多个相等的名称(2个或更多)并且所有PersonalNo相等且所有StringId都是equall时(参见Lisa)它应该返回0(正确)
>当有多个相同的名字(2个或更多)并且有多个不同的PersonalNo和所有StringId提供它应该是：我们看到这里有2个不同的人Jennifer与4805250141 PersonalNo和Jennifer与4920225088 PersonalNo,Jennifer与NULL PersonalNo有与Jennifer一样的StringId和4920225088 PersonalNo so它应该返回0(正确)并且不应该选择带有4805250141 PersonalNo的Jennifer,因为具有StringID并且只有1行具有相同的PersonalNo.
>如果只有1行并且没有提供StringId它根本不应出现在select中.

样本数据

Company     Name        PersonalNo   StringId 
Comp1       Peter       3850342515    85426 -------------------------------------------------------------------
Comp2       Peter       3850342515    ''    -- If have the same PersonalNo and there is no StringId - 1 (wrong)
Comp1       John        NULL          12345 ------------------------------------------------------------------
Comp2       John        3952525252    12345 -- If have the same StringId and 1 PersonalNo is NULL - 0 (correct)
Comp1       Lisa        4951212581    52124 ----------------------------------------------------------------
Comp3       Lisa        4951212581    52124 -- If PersonalNo are equal and StringId are equal - 0 (correct)
Comp1       Jennifer    4805250141    ''    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Comp1       Jennifer    4920225088    55443 -- If have 2 different PersonalNo and NULL PersonalNo,but where PersonalNo is NULL 
Comp3       Jennifer    NULL          55443 -- Have the same StringId with other row where is provided PersonalNo it should be 0 (correct),with different PersonalNo where is no StringId shouldn't appear at all.
Comp1       Ralph       3961212256    ''    -- Shouldn't appear in select list,because only 1 row with this PersonalNo and there is no StringID

期望的输出

Peter     1
John      0
Lisa      0
Jennifer  0

QUERY

LEFT JOIN (SELECT Name,(
                    SELECT CASE WHEN MIN(PersonalNo) <> MAX(d.PersonalNo) 
                                    and MIN(CASE WHEN StringId IS NULL THEN '0' ELSE StringId END) <> MAX(CASE WHEN d.StringId IS NULL THEN '0' ELSE d.StringId END) -- this is wrong                                                 
                                    and MIN(PersonalNo) <> ''
                                    and MIN(PersonalNo) IS NOT NULL                          
                                    and MAX(rn) > 1 THEN 1
                                 ELSE 0
                            END AS CheckPersonalNo 
                    FROM (                               
                        SELECT Name,PersonalNo,[StringId],ROW_NUMBER() OVER (PARTITION BY Name,PersonalNo ORDER BY Name) rn
                        FROM TableEmp e1 
                        WHERE Condition = 1 and e1.Name = d.Name                                 
                        ) sub2                              
                    GROUP BY Name
                    ) CheckPersonalNo                                                                                   
        FROM [TableEmp] d   
        WHERE Condition = 1
        GROUP BY Name        
        ) f ON f.Name = x.Name

查询的问题是我只能按名称分组,不能将PersonalNo添加到GROUP BY子句,所以我需要在选择列表中使用聚合.但是现在它仅比较MIN和MAX值,如果有超过2行具有相同的名称它没有按预期工作.

我需要做类似的事情,比较PARTITION BY Fullname,PersonalNo的值.它现在比较具有相同名称的值(不依赖于PersonalNo).

有任何想法吗？如果您有任何问题 – 请问我,我会尽力解释.

更新1

如果有2个条目具有不同的PersonalNo,但它们的StringId相等,则应为1(错误).

Company     Name    PersonalNo   StringId 
Comp1       Anna    4805250141    88552    -- different PersonalNo and the same StringId for both should go as 1 (wrong)
Comp1       Anna    4920225088    88552

现在回来像：

Anna    0
Anna    0

它应该是：

Anna    1

更新2

在Identifier列中UNION更新后返回StringId：55443(对于下面的数据),但在这种情况下,当1个条目具有PersonalNo时,其他为空,但它们都具有相同(相等)StringId它是正确的(应为0)

Comp1       Jennifer    4920225088    55443  
Comp3       Jennifer    ''            55443

解决方法

我希望我理解你的要求..

可能有其他方法可以做到这一点,但个人我可能会使用临时表进行临时工作,如果是我这样做..

--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table


--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null

--find all records with non matching string ids
select 
    name,count(*) as numIDs
    into #issues
from(
    select
        name,stringID
    from 
        #cleaned
    group by
        name,stringID
    ) as i
group by
    name,PersonalNo
having 
    count(*) > 1

--select data for viewing.
select
    distinct
    s.name,case
        when i.name is not null then 1
        else 0
    end as issue
from
    #cleaned as s
    left outer join #issues as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

sqlFiddle：http://sqlfiddle.com/#!3/f4aab/7

抱歉,如果这里有虫子,但我相信你会得到这个想法,它不是火箭科学,只是另一种方法

编辑：刚刚注意到你对没有字符串ID的行感兴趣..只是如果它是唯一的行,那么它不是问题.我修改了第一个select(into #cleaned)以获取所有行.

编辑：没有临时表现在你知道它在做什么,这里是没有任何临时表的相同的东西 – 但警告这更新源表分配丢失的personalNo的

update c
    set c.personalNo = s.personalNo
from table1 as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null


select
    distinct
    s.name,case
        when i.name is not null then 1
        else 0
    end as issue
from
    table1 as s
    left outer join (
                select 
                    name,count(*) as numIDs
                from(
                    select
                        name,stringID
                    from 
                        table1
                    group by
                        name,stringID
                    ) as i
                group by
                    name,PersonalNo
                having 
                    count(*) > 1
        )
        as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

sqlFiddle：http://sqlfiddle.com/#!3/f4aab/8

分区我没看到我将如何在这里使用分区,因为你想要做的只是知道是否有多行,我使用更复杂的制表分区,或者如果我要对更新数据的判断调用的结果进行排名基于更复杂的规则..但无论如何这里是一个禁止分区的乌鸦：D

Select
    name,personalNo,case
        when numstrings > 1 then 1
        else 0 end as issue
from
    (select
        name,row_number() over (partition by 
                                    name,personalNo 
                                order by 
                                    name,stringID
                                    ) as numstrings
    from
        #cleaned
    group by
        name,stringid) as d
order by
    issue desc

注意：这使用了如上所述的#cleaned表,因为我认为这使得难以解决的问题的关键是有时候缺少的个人诺.

没有临时表,没有更新

使用上面显然可以不使用任何临时表或更新任何东西,它只是一个可读性/可维护性的问题,以及它是否实际上甚至更快.这可以更稳定地处理具有多个personalNo分配的字符串ID：

select
    distinct
    s.name,count(*) as numIDs
                from(
                    select
                        a.name,coalesce(a.PersonalNo,b.PersonalNo) as PersonalNo,a.stringID
                    from 
                        table1 as a
                            left outer join table1 as b
                                on a.name = b.name
                                and a.stringid=b.stringid
                                and a.personalNo != b.personalNo
                                and b.personalNo Is Not Null
                    group by
                        a.name,a.PersonalNo,a.stringID,b.PersonalNo
                    ) as i
                group by
                    name,PersonalNo
                having 
                    count(*) > 1
        )
        as i
        on s.name = i.name
        and s.personalNo = i.personalNo
order by issue desc

sqlFiddle：http://sqlfiddle.com/#!3/f4aab/9

编辑：寻找不一致的个人数字 – 这使用一个临时表,但你可以像上一个例子中所做的那样交换它.注意你要求的原始结构略有偏差,因为这更像是我会这样做任务,但这里有足够的代码供您重新进行任何您想要的方式.

--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table1


--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and c.personalNo is null
        and s.personalNo is not null


Select
    IssueType,Name,Identifier
from 
    (
        --find all records with non matching PersonalNos
        select 
            name,cast('StringID: ' + stringID as nvarchar(400)) as Identifier,cast('Inconsistent  PersonalNo' as nvarchar(400)) as issueType
        from(
            select
                name,stringID
            from 
                #cleaned
            group by
                name,stringID
            ) as i
        group by
            name,StringId
        having 
            count(*) > 1

    UNION    
        --find all records with non matching string ids

        select 
            name,'PersonalNo: ' + PersonalNo,cast('Inconsistent String ID' as nvarchar(400)) as issueType
        from(
            select
                name,PersonalNo
        having 
            count(*) > 1
    ) as a

sqlFiddle：http://sqlfiddle.com/#!3/e9da2/18

更新：也想接受空字符串personalNo
这是另一个新要求..接受空字符串的方式与personalNo中的NULL相同

--select data into a temp table that can be modified
select
    *
    into #cleaned
from 
    table1

--apply personal numbers based on other records with matching string id
--you could take note of the records you are doing this to for data clean up
update c
    set c.personalNo = s.personalNo
from #cleaned as c
    inner join table1 as s
        on c.name = s.name
        and c.stringID = s.stringID
        and  (c.personalNo IS NULL OR c.personalNo ='')
        and s.personalNo is not null
        and s.personalNo != ''


Select
     IssueType,StringId
        having 
            count(*) > 1

  UNION    
        --find all records with non matching string ids
        select 
            name,PersonalNo
        having 
            count(*) > 1
    ) as a

sqlFiddle：http://sqlfiddle.com/#!3/412127/8

sql – PARTITION BY Name,Id用于比较和检测问题

解决方法

猜你在找的MsSQL相关文章