SELECT * FROM table1 AS a INNER JOIN table2 AS b ON (a.name LIKE '%' + b.name + '%')
对于我的数据集,这需要大约90秒执行,所以我一直在寻找加速它的方法.没有好的理由,我以为我会尝试PATINDEX而不是LIKE …
SELECT * FROM table1 AS a INNER JOIN table2 AS b ON (PATINDEX('%' + b.name + '%',a.name) > 0)
在相同的数据集上,这在眨眼中执行并返回相同的结果.
任何人都可以解释为什么喜欢比PATINDEX慢得多?鉴于LIKE只是返回一个BOOLEAN,而PATINDEX正在返回实际位置,我会预期后者会更慢,如果有什么,或者这只是两个功能的写作效率的问题?
好的,这里是每个查询完整,其后是执行计划. “#StakeholderNames”只是我匹配的可能的名称的临时表.
我已经拉回了实时数据并多次运行每个查询.第一个是大约17秒(比现在的数据库原来的90秒少一点),第二秒不到1秒…
SELECT sh.StakeholderID,sh.HoldingID,i.AgencyCommissionImportID,1 FROM AgencyCommissionImport AS i INNER JOIN #StakeholderNames AS sn ON REPLACE(REPLACE(i.ClientName,' ',''),','') LIKE '%' + sn.Name + '%' INNER JOIN Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber) INNER JOIN StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID) WHERE i.AgencyCommissionFileID = @AgencyCommissionFileID AND (i.MatchTypeID = 0) AND ((i.MatchedHoldingID IS NULL) OR (i.MatchedStakeholderID IS NULL)) |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]),SET:([#Results].[StakeholderID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[StakeholderID] as [sh].[StakeholderID],[#Results].[HoldingID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID],[#Results].[AgencyCommissionImportID] = [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionImportID] as [i].[AgencyCommissionImportID],[#Results].[MatchTypeID] = [Expr1014],[#Results].[indx] = [Expr1013])) |--Compute Scalar(DEFINE:([Expr1014]=(1))) |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1835869607),(2),N'#Results'))) |--Top(ROWCOUNT est 0) |--Hash Match(Inner Join,HASH:([h].[ProviderName],[h].[HoldingReference])=([i].[Provider],[i].[PlanNumber]),RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND [Expr1015] like [Expr1016])) |--Nested Loops(Inner Join,OUTER REFERENCES:([sh].[HoldingID])) | |--Nested Loops(Inner Join,OUTER REFERENCES:([sn].[StakeholderID])) | | |--Compute Scalar(DEFINE:([Expr1016]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%',[Expr1017]=LikeRangeStart(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'),[Expr1018]=LikeRangeEnd(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'),[Expr1019]=LikeRangeInfo(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'))) | | | |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn])) | | |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]),SEEK:([sh].[StakeholderID]=#StakeholderNames.[StakeholderID] as [sn].[StakeholderID]) ORDERED FORWARD) | |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[Holding].[PK_Holding] AS [h]),SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD) |--Compute Scalar(DEFINE:([Expr1015]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],''))) |--Clustered Index Scan(OBJECT:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PK_AgencyCommissionImport] AS [i]),WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL))) SELECT sh.StakeholderID,1 FROM AgencyCommissionImport AS i INNER JOIN #StakeholderNames AS sn ON (PATINDEX('%' + sn.Name + '%',REPLACE(REPLACE(i.ClientName,'')) > 0) INNER JOIN Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber) INNER JOIN StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID) WHERE i.AgencyCommissionFileID = @AgencyCommissionFileID AND (i.MatchTypeID = 0) AND ((i.MatchedHoldingID IS NULL) OR (i.MatchedStakeholderID IS NULL)) |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]),[#Results].[indx] = [Expr1013])) |--Compute Scalar(DEFINE:([Expr1014]=(1))) |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1867869721),RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND patindex([Expr1015],[Expr1016])>(0))) |--Nested Loops(Inner Join,OUTER REFERENCES:([sn].[StakeholderID])) | | |--Compute Scalar(DEFINE:([Expr1015]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%')) | | | |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn])) | | |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]),SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD) |--Compute Scalar(DEFINE:([Expr1016]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL)))
解决方法
运行每个查询时,sql Server返回实际的执行计划,并比较执行计划.
另外,当比较两个查询的性能时,运行每个查询两次,并抛出第一次运行的时间. (第一个查询运行可能包括很多繁重的提升(语句解析和数据库i / o).第二个运行将给你一个经过的时间比其他查询更有效.
任何人都可以解释为什么喜欢比PATINDEX慢得多?
每个查询的执行计划可能会解释差异.
这仅仅是两个功能的写作效率有多高?
这不是真正的功能如何有效地编写的问题.真正重要的是生成的执行计划.重要的是如果谓词是可变的,以及优化器是否选择使用可用的索引.
[编辑]
在快速测试中,我跑了,我看到执行计划有所不同.使用连接谓词中的LIKE操作符,该计划将在“计算机标量”操作之后的table2上包含一个“Table Spool(Lazy Spool)”操作.使用PATINDEX功能,我在计划中看不到“表假脱机”操作.但是,鉴于查询,表格,索引和统计信息的差异,我获得的计划可能与您获得的计划有显着差异.
[编辑]
在两个查询(除表达式占位符名称之外)的执行计划输出中看到的唯一区别是调用三个内部函数(LikeRangeStart,LikeRangeEnd和LikeRangeInfo代替对PATINDEX函数的一个调用,这些函数似乎被称为结果集中的每一行,并且生成的表达式用于在嵌套循环中扫描内部表.
因此,它看起来像LIKE操作符的三个函数调用可能比单次调用PATINDEX函数更昂贵(经过时间). (解释计划显示了对嵌套循环连接的外部结果集中的每一行进行调用的函数;对于大量行,即使经过的时间的微小差异也可能被乘以足够多的次数,以表现出显着的性能差异.
在我的系统上运行一些测试用例后,我仍然对您所看到的结果感到困惑.
可能是对PATINDEX函数的调用与对三个内部函数(LikeRangeStart,LikeRangeEnd,LikeRangeInfo)的调用的性能有关的问题.
对于在“大”足够的结果集上执行的那些,可能会将经过时间的小差异乘以显着差异.