我有一个带有字符串列的表和一个检查具有一定长度的行的谓词.在sql Server 2014中,无论我检查的长度如何,我都会看到1行的估计值.这产生了非常糟糕的计划,因为实际上有数千甚至数百万行,sql Server正在选择将此表放在嵌套循环的外侧.
是否有sql Server 2014的基数估计值为1.0003,而sql Server 2012估计为31,622行?有一个很好的解决方法吗?
以下是该问题的简短复制:
-- Create a table with 1MM rows of dummy data CREATE TABLE #customers (cust_nbr VARCHAR(10) NOT NULL) GO INSERT INTO #customers WITH (TABLOCK) (cust_nbr) SELECT TOP 1000000 CONVERT(VARCHAR(10),ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS cust_nbr FROM master..spt_values v1 CROSS JOIN master..spt_values v2 GO -- Looking for string of a certain length. -- While both CEs yield fairly poor estimates,the 2012 CE is much -- more conservative (higher estimate) and therefore much more likely -- to yield an okay plan rather than a drastically understimated loop join. -- 2012: 31,622 rows estimated,900K rows actual -- 2014: 1 row estimated,900K rows actual SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 9481) -- Optionally,use 2012 CE GO
Here is a more complete script showing additional tests
我也读过whitepaper on the SQL Server 2014 Cardinality Estimator,但没有找到任何澄清情况的东西.
解决方法
对于遗留CE,我看到估计是行的3.16228% – 这是用于column = literal谓词的“幻数”启发式(还有其他基于谓词构造的启发式方法 – 但LEN包裹在列的周围遗留的CE结果与此猜测框架相匹配).你可以在Joe Sack的
Selectivity Guesses in absence of Statistics和Ian Jose的
Constant-Constant Comparison Estimation的帖子上看到这个例子.
-- Legacy CE: 31622.8 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 9481); -- Legacy CE GO
现在,对于新的CE行为,它看起来现在对优化器可见(这意味着我们可以使用统计信息).我查看了下面的计算器输出,你可以查看相关的自动生成统计数据作为指针:
-- New CE: 1.00007 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 2312 ); -- New CE GO -- View New CE behavior with 2363 (for supported option use XEvents) SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 2312,QUERYTRACEON 2363,QUERYTRACEON 3604,RECOMPILE); -- New CE GO /* Loaded histogram for column QCOL: [tempdb].[dbo].[#customers].cust_nbr from stats with id 2 Using ambient cardinality 1e+006 to combine distinct counts: 999927 Combined distinct count: 999927 Selectivity: 1.00007e-006 Stats collection generated: CStCollFilter(ID=2,CARD=1.00007) CStCollBaseTable(ID=1,CARD=1e+006 TBL: #customers) End selectivity computation */ EXEC tempdb..sp_helpstats '#customers'; --Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1) DBCC SHOW_STATISTICS('tempdb..#customers','_WA_Sys_00000001_B0368087'); --That's my Stats name yours is subject to change
遗憾的是,逻辑依赖于对不同值的数量的估计,而不是针对LEN函数的影响进行调整.
可能的解决方法
通过将LEN重写为LIKE,您可以在两种CE模型下获得基于trie的估计:
SELECT COUNT_BIG(*) FROM #customers AS C WHERE C.cust_nbr LIKE REPLICATE('_',6);
有关使用跟踪标志的信息: