以下是一个假设的情况,这接近我真正的问题.表格1
recid firstname lastname company 1 A B AAA 2 D E DEF 3 G H IJK 4 A B ABC
我有一个看起来像这样的table2
recid firstname lastname company 10 A B ABC 20 D E DEF 30 M D DIM 40 A B CCC
现在如果我加入recid表,它会给出0的结果,不会有重复,因为recid是唯一的.但是如果我加入firstname和lastname列,这不是唯一的,有重复的,我在内部连接中得到重复.加入的列数越多,变得越差(创建更多的重复项).
在上述简单的情况下,如何在以下查询中删除重复项.我想比较firstname和lastname,如果匹配,我从table2返回firstname,lastname和recid
select distinct * from (select recid,first,last from table1) a inner join (select recid,last from table2) b on a.first = b.first
如果有人想要在未来玩它,脚本就在这里
create table table1 (recid int not null primary key,first varchar(20),last varchar(20),company varchar(20)) create table table2 (recid int not null primary key,company varchar(20)) insert into table1 values(1,'A','B','ABC') insert into table1 values(2,'D','E','DEF') insert into table1 values(3,'M','N','MNO') insert into table1 values(4,'ABC') insert into table2 values(10,'ABC') insert into table2 values(20,'DEF') insert into table2 values(30,'Q','R','QRS') insert into table2 values(40,'ABC')
解决方法
你不想加入本身,你只是测试存在/设置包含.
我不知道你现在编码的sql的当前风味,但这应该是正常的.
SELECT MAX(recid),firstname,lastname FROM table2 T2 WHERE EXISTS (SELECT * FROM table1 WHERE firstname = T2.firstame AND lastname = T2.lastname) GROUP BY lastname,firstname
如果你想实现一个连接,离开代码大致相同:
即
SELECT max(t2.recid),t2.firstame,t2.lastname FROM Table2 T2 INNER JOIN Table1 T1 ON T2.firstname = t1.firstname and t2.lastname = t1.lastname GROUP BY t2.firstname,t2.lastname
根据DBMS,内部连接的实现可能与Exists(半连接vs连接)不同,但优化器有时可以计算出来,并选择正确的运算符,无论您使用哪种方式编写它.