这是表格(为简洁起见而简化):
CREATE TABLE TestDuration (VALIDATION_TIMESTAMP datetime,ID_TICKET bigint,ID_PLACE bigint)
和数据:
INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-25 19:24:05.700',1,1) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_PLACE) VALUES ('2012-07-25 20:08:04.250',2,2) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_PLACE) VALUES ('2012-07-26 10:18:13.040',3,3) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_PLACE) VALUES ('2012-07-26 10:18:20.990',ID_PLACE) VALUES ('2012-07-26 10:18:29.290',4) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_PLACE) VALUES ('2012-07-26 10:25:37.040',4)
这是聚合查询:
SELECT VisitDurationCalcTable.ID_PLACE AS ID_PLACE_IN,VisitDurationCalcTable.ID_NEXT_VISIT_PLACE AS ID_PLACE_OUT,COUNT(visitduration) AS NUMBER_OF_VISITS,AVG(visitduration) AS AVERAGE_VISIT_DURATION FROM ( SELECT EntryData.VALIDATION_TIMESTAMP,EntryData.ID_TICKET,EntryData.ID_PLACE,( SELECT TOP 1 ID_PLACE FROM TestDuration WHERE ID_TICKET=EntryData.ID_TICKET AND VALIDATION_TIMESTAMP>EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC ) AS ID_NEXT_VISIT_PLACE,DATEDIFF(n,EntryData.VALIDATION_TIMESTAMP,( SELECT TOP 1 VALIDATION_TIMESTAMP FROM TestDuration WHERE ID_TICKET=EntryData.ID_TICKET and VALIDATION_TIMESTAMP>EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC ) ) AS visitduration FROM TestDuration EntryData) AS VisitDurationCalcTable WHERE VisitDurationCalcTable.ID_NEXT_VISIT_PLACE IS NOT NULL GROUP BY VisitDurationCalcTable.ID_PLACE,VisitDurationCalcTable.ID_NEXT_VISIT_PLACE
该查询有效,但我很快遇到了性能问题.对于表中的40K行,查询执行时间约为3分钟.我不是sql大师,所以无法真正看到如何将查询转换为更快的工作.这不是一个关键的报告,每月只做一次,但它使我的应用看起来很糟糕.我有一种感觉,我在这里缺少一些简单的东西.
解决方法
@H_301_18@ TLDR版本您显然缺少有助于此查询的索引.添加缺失的索引可能会导致其自身的数量级改进.
如果你在sql Server 2012上使用LEAD重写查询也会这样做(尽管这也会从缺失的索引中受益).
如果您仍然在2005/2008,那么您可以对现有查询进行一些改进,但与索引更改相比,效果相对较小.
版本更长
为了花费3分钟,我假设您根本没有有用的索引,最大的胜利就是简单地添加索引(对于每月运行一次的报告,只需将三列中的数据复制到适当索引的#temp表中如果您不想创建永久索引,可能就足够了).
你说为了清晰起见你简化了表格,它有40K行.假设有以下测试数据
CREATE TABLE TestDuration ( Id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,VALIDATION_TIMESTAMP DATETIME,ID_TICKET BIGINT,ID_PLACE BIGINT,OtherColumns CHAR(100) NULL ) INSERT INTO TestDuration (VALIDATION_TIMESTAMP,ID_PLACE) SELECT TOP 40000 DATEADD(minute,ROW_NUMBER() OVER (ORDER BY (SELECT 0)),GETDATE()),ABS(CHECKSUM(NEWID())) % 10,ABS(CHECKSUM(NEWID())) % 100 FROM master..spt_values v1,master..spt_values v2
您的原始查询在MAXDOP 1上的机器上需要51秒,以及以下IO统计信息
Table 'Worktable'. Scan count 79990,logical reads 1167573,physical reads 0 Table 'TestDuration'. Scan count 3,logical reads 2472,physical reads 0.
对于表中40,000行中的每一行,它执行两种所有匹配的ID_TICKET行,以便按VALIDATION_TIMESTAMP的顺序识别下一行
简单地添加如下索引会使经过的时间减少到406毫秒,改进超过100倍(此答案中的后续查询假定此索引现已到位).
CREATE NONCLUSTERED INDEX IX ON TestDuration(ID_TICKET,VALIDATION_TIMESTAMP) INCLUDE (ID_PLACE)
该计划现在看起来如下,80,000种类型和假脱机操作被索引搜索取代.
Table 'Worktable'. Scan count 0,logical reads 0,physical reads 0 Table 'TestDuration'. Scan count 79991,logical reads 255707,physical reads 0
然而,它仍在为每一行寻找2次.使用CROSS APPLY进行重写可以将它们组合在一起.
SELECT VisitDurationCalcTable.ID_PLACE AS ID_PLACE_IN,COUNT(visitduration) AS NUMBER_OF_VISITS,AVG(visitduration) AS AVERAGE_VISIT_DURATION FROM (SELECT EntryData.VALIDATION_TIMESTAMP,CA.ID_PLACE AS ID_NEXT_VISIT_PLACE,CA.VALIDATION_TIMESTAMP) AS visitduration FROM TestDuration EntryData CROSS APPLY (SELECT TOP 1 ID_PLACE,VALIDATION_TIMESTAMP FROM TestDuration WHERE ID_TICKET = EntryData.ID_TICKET AND VALIDATION_TIMESTAMP > EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC) CA) AS VisitDurationCalcTable GROUP BY VisitDurationCalcTable.ID_PLACE,VisitDurationCalcTable.ID_NEXT_VISIT_PLACE
这给了我269毫秒的经过时间
Table 'Worktable'. Scan count 0,physical reads 0 Table 'TestDuration'. Scan count 40001,logical reads 127988,physical reads 0
虽然读取的数量仍然很高,但是搜索都是刚刚被扫描读取的读取页面,因此它们都是缓存中的所有页面.使用表变量可以减少读取次数.
DECLARE @T TABLE ( VALIDATION_TIMESTAMP DATETIME,RN INT PRIMARY KEY(ID_TICKET,RN) ) INSERT INTO @T SELECT VALIDATION_TIMESTAMP,ID_PLACE,ROW_NUMBER() OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP) AS RN FROM TestDuration SELECT T1.ID_PLACE AS ID_PLACE_IN,T2.ID_PLACE AS ID_PLACE_OUT,COUNT(*) AS NUMBER_OF_VISITS,AVG(DATEDIFF(n,T1.VALIDATION_TIMESTAMP,T2.VALIDATION_TIMESTAMP)) AS AVERAGE_VISIT_DURATION FROM @T T1 INNER MERGE JOIN @T T2 ON T1.ID_TICKET = T2.ID_TICKET AND T2.RN = T1.RN + 1 GROUP BY T1.ID_PLACE,T2.ID_PLACE
但是对于我来说,至少将经过的时间略微增加到301毫秒(对于选择插入258毫秒为43毫秒),但这仍然是一个很好的选择,而不是创建一个永久索引.
(Insert) Table 'TestDuration'. Scan count 1,logical reads 233,physical reads 0 (Select) Table 'Worktable'. Scan count 0,physical reads 0 Table '#0C50D423'. Scan count 2,logical reads 372,physical reads 0
最后,如果您使用的是sql Server 2012,则可以使用LEAD(SQL Fiddle)
WITH CTE AS (SELECT ID_PLACE AS ID_PLACE_IN,LEAD(ID_PLACE) OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP) AS ID_PLACE_OUT,VALIDATION_TIMESTAMP,LEAD(VALIDATION_TIMESTAMP) OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP)) AS VISIT_DURATION FROM TestDuration) SELECT ID_PLACE_IN,ID_PLACE_OUT,COUNT(*) AS NUMBER_OF_VISITS,AVG(VISIT_DURATION) AS AVERAGE_VISIT_DURATION FROM CTE WHERE ID_PLACE_OUT IS NOT NULL GROUP BY ID_PLACE_IN,ID_PLACE_OUT
这给了我249毫秒的经过时间
Table 'Worktable'. Scan count 0,physical reads 0 Table 'TestDuration'. Scan count 1,physical reads 0
LEAD版本在没有索引的情况下也表现良好.省略最佳索引会为计划添加一个额外的SORT,这意味着它必须读取我的测试表上更宽的聚簇索引,但它仍然在293毫秒的经过时间内完成.
Table 'Worktable'. Scan count 0,logical reads 824,physical reads 0