我提出了一个multi-statement,schema-bound TVF,我认为它比运行标准功能更快.我也认为多语句TVF会被“内联”,虽然我错了,但你会在下面看到.这个问题是关于这两种风格的TVF的性能差异.首先,您需要查看代码.
这是多语句TVF:
IF OBJECT_ID('dbo.IsPalindrome') IS NOT NULL DROP FUNCTION dbo.IsPalindrome; GO CREATE FUNCTION dbo.IsPalindrome ( @Word NVARCHAR(500) ) RETURNS @t TABLE ( IsPalindrome BIT NOT NULL ) WITH SCHEMABINDING AS BEGIN DECLARE @IsPalindrome BIT; DECLARE @LeftChunk NVARCHAR(250); DECLARE @RightChunk NVARCHAR(250); DECLARE @StrLen INT; DECLARE @Pos INT; SET @RightChunk = ''; SET @IsPalindrome = 0; SET @StrLen = LEN(@Word) / 2; IF @StrLen % 2 = 1 SET @StrLen = @StrLen - 1; SET @Pos = LEN(@Word); SET @LeftChunk = LEFT(@Word,@StrLen); WHILE @Pos > (LEN(@Word) - @StrLen) BEGIN SET @RightChunk = @RightChunk + SUBSTRING(@Word,@Pos,1) SET @Pos = @Pos - 1; END IF @LeftChunk = @RightChunk SET @IsPalindrome = 1; INSERT INTO @t VALUES (@IsPalindrome); RETURN END GO
内联TVF:
IF OBJECT_ID('dbo.InlineIsPalindrome') IS NOT NULL DROP FUNCTION dbo.InlineIsPalindrome; GO CREATE FUNCTION dbo.InlineIsPalindrome ( @Word NVARCHAR(500) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( WITH Nums AS ( SELECT N = number FROM dbo.Numbers ) SELECT IsPalindrome = CASE WHEN EXISTS ( SELECT N FROM Nums WHERE N <= L / 2 AND SUBSTRING(S,N,1) <> SUBSTRING(S,1 + L - N,1) ) THEN 0 ELSE 1 END FROM (SELECT LTRIM(RTRIM(@Word)),LEN(@Word)) AS v (S,L) ); GO
上述函数中的Numbers表定义为:
CREATE TABLE dbo.Numbers ( Number INT NOT NULL );
注意:数字表没有任何索引,也没有主键,并且包含1,000,000行.
试验台临时表:
IF OBJECT_ID('tempdb.dbo.#Words') IS NOT NULL DROP TABLE #Words; GO CREATE TABLE #Words ( Word VARCHAR(500) NOT NULL ); INSERT INTO #Words(Word) SELECT o.name + REVERSE(w.name) FROM sys.objects o CROSS APPLY ( SELECT o.name FROM sys.objects o ) w;
在我的测试系统上,上面的INSERT导致将16900行插入到#Words表中.
为了测试这两种变化,我设置了STATISTICS IO,TIME ON;并使用以下内容:
SELECT w.Word,p.IsPalindrome FROM #Words w CROSS APPLY dbo.IsPalindrome(w.Word) p ORDER BY w.Word; SELECT w.Word,p.IsPalindrome FROM #Words w CROSS APPLY dbo.InlineIsPalindrome(w.Word) p ORDER BY w.Word;
我期望InlineIsPalindrome版本明显更快,但是以下结果不支持该假设.
多语句TVF:
Table ‘#A1CE04C3’. Scan count 16896,logical reads 16900,physical reads 0,read-ahead reads 0,lob logical reads 0,lob physical reads 0,lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0,logical reads 0,lob read-ahead reads 0.
Table ‘#Words’. Scan count 1,logical reads 88,lob read-ahead reads 0.sql Server Execution Times:
cpu time = 1700 ms,elapsed time = 2022 ms.
sql Server parse and compile time:
cpu time = 0 ms,elapsed time = 0 ms.
内联TVF:
Table ‘Numbers’. Scan count 1,logical reads 1272030,lob read-ahead reads 0.
sql Server Execution Times:
cpu time = 137874 ms,elapsed time = 139415 ms.
sql Server parse and compile time:
cpu time = 0 ms,elapsed time = 0 ms.
执行计划如下:
在这种情况下,为什么内联变量比多语句变量慢得多?
在回应@AaronBertrand的评论时,我修改了dbo.InlineIsPalindrome函数,以限制CTE返回的行与输入字的长度相匹配:
CREATE FUNCTION dbo.InlineIsPalindrome ( @Word NVARCHAR(500) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( WITH Nums AS ( SELECT N = number FROM dbo.Numbers WHERE number <= LEN(@Word) ) SELECT IsPalindrome = CASE WHEN EXISTS ( SELECT N FROM Nums WHERE N <= L / 2 AND SUBSTRING(S,L) );
正如@MartinSmith建议的那样,我已经在dbo.Numbers表中添加了一个主键和聚簇索引,这肯定有助于并且更接近人们期望在生产环境中看到的内容.
现在重新运行上面的测试会产生以下统计信息:
交叉申请dbo.IsPalindrome(w.Word)p:
(17424 row(s) affected)
Table ‘#B1104853’. Scan count 17420,logical reads 17424,logical reads 90,lob read-ahead reads 0.sql Server Execution Times:
cpu time = 1763 ms,elapsed time = 2192 ms.
dbo.FunctionIsPalindrome(w.Word):
(17424 row(s) affected)
Table ‘Worktable’. Scan count 0,lob read-ahead reads 0.sql Server Execution Times:
cpu time = 328 ms,elapsed time = 424 ms.
交叉申请dbo.InlineIsPalindrome(w.Word)p:
(17424 row(s) affected)
Table ‘Numbers’. Scan count 1,logical reads 237100,lob read-ahead reads 0.sql Server Execution Times:
cpu time = 17737 ms,elapsed time = 17946 ms.
我在sql Server 2012 SP3,v11.0.6020,Developer Edition上测试了这个.
这是我的数字表的定义,主键和聚集索引:
CREATE TABLE dbo.Numbers ( Number INT NOT NULL CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED ); ;WITH n AS ( SELECT v.n FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) v(n) ) INSERT INTO dbo.Numbers(Number) SELECT ROW_NUMBER() OVER (ORDER BY n1.n) FROM n n1,n n2,n n3,n n4,n n5,n n6;
解决方法
在Number上添加一个聚簇主键,并使用forceseek提示尝试以下操作以获得所需的搜索.
据我所知,这个提示是必需的,因为sql Server只估计该表的27%将与谓词匹配(< = 30%和<>减少到27%).因此,它只需要读取3-4行才能找到匹配的行,并且可以退出半连接.因此,扫描选项的成本非常低廉.但事实上,如果确实存在任何回文,那么它将不得不阅读整个表格,因此这不是一个好的计划.
CREATE FUNCTION dbo.InlineIsPalindrome ( @Word NVARCHAR(500) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( WITH Nums AS ( SELECT N = number FROM dbo.Numbers WITH(FORCESEEK) ) SELECT IsPalindrome = CASE WHEN EXISTS ( SELECT N FROM Nums WHERE N <= L / 2 AND SUBSTRING(S,L) ); GO
随着这些变化,它飞了我(需要228ms)