我有两张桌子,每张桌子大约有20万行.我在下面运行查询,运行一个多小时后仍然没有完成.这可能是什么解释?
SELECT dbo.[new].[colom1],dbo.[new].[colom2],dbo.[new].[colom3],dbo.[new].[colom4],dbo.[new].[Value] as 'nieuwe Value',dbo.[old].[Value] as 'oude Value' FROM dbo.[new] JOIN dbo.[old] ON dbo.[new].[colom1] = dbo.[old].[colom1] and dbo.[new].[colom2] = dbo.[old].[colom2] and dbo.[new].[colom3] = dbo.[old].[colom3] and dbo.[new].[colom4] = dbo.[old].[colom4] where dbo.[new].[Value] <> dbo.[old].[Value]
从评论;
解决方法
似乎对于单个列上的等式连接,连接键中具有NULL值的行将被过滤掉,但是对于多个列的连接来说不是这种情况.
结果,散列连接复杂度从O(N)变为O(N ^ 2).
结果,散列连接复杂度从O(N)变为O(N ^ 2).
================================================== ====================
在这方面,我想推荐一个伟大的文章保罗·怀特关于类似的问题 –
Hash Joins on Nullable Columns
================================================== ====================
我已经生成了一个这个用例的小模拟,我鼓励你测试你的解决方案.
create table mytab1 (c1 int null,c2 int null) create table mytab2 (c1 int null,c2 int null) ;with t(n) as (select 1 union all select n+1 from t where n < 10) insert into mytab1 select null,null from t t0,t t1,t t2,t t3,t t4 insert into mytab2 select null,null from mytab1 insert into mytab1 values (111,222); insert into mytab2 values (111,222);
select * from mytab1 t1 join mytab2 t2 on t1.c1 = t2.c1 and t1.c2 = t2.c2
对于OP查询,我们应该在任何连接关键字列中删除具有NULL值的行.
SELECT dbo.[new].[colom1],dbo.[old].[Value] as 'oude Value' FROM dbo.[new] JOIN dbo.[old] ON dbo.[new].[colom1] = dbo.[old].[colom1] and dbo.[new].[colom2] = dbo.[old].[colom2] and dbo.[new].[colom3] = dbo.[old].[colom3] and dbo.[new].[colom4] = dbo.[old].[colom4] where dbo.[new].[Value] <> dbo.[old].[Value] and dbo.[new].[colom1] is not null and dbo.[new].[colom2] is not null and dbo.[new].[colom3] is not null and dbo.[new].[colom4] is not null and dbo.[old].[colom1] is not null and dbo.[old].[colom2] is not null and dbo.[old].[colom3] is not null and dbo.[old].[colom4] is not null