我正在处理一些当前以1分钟为间隔存储的数据,如下所示:
CREATE TABLE #MinuteData ( [Id] INT,[MinuteBar] DATETIME,[Open] NUMERIC(12,6),[High] NUMERIC(12,[Low] NUMERIC(12,[Close] NUMERIC(12,6) ); INSERT INTO #MinuteData ( [Id],[MinuteBar],[Open],[High],[Low],[Close] ) VALUES ( 1,'2015-01-01 17:00:00',1.557870,1.557880,1.557880 ),( 2,'2015-01-01 17:01:00',1.557900,( 3,'2015-01-01 17:02:00',1.557960,1.558070,1.558040 ),( 4,'2015-01-01 17:03:00',1.558080,1.558100,1.558040,1.558050 ),( 5,'2015-01-01 17:04:00',1.558050,1.558020,1.558030 ),( 6,'2015-01-01 17:05:00',1.558580,1.558710,1.557950 ),( 7,'2015-01-01 17:06:00',1.557910,1.558120,1.557990 ),( 8,'2015-01-01 17:07:00',1.557940,1.558250,1.558170 ),( 9,'2015-01-01 17:08:00',1.558140,1.558200,1.558120 ),( 10,'2015-01-01 17:09:00',1.558110,1.557970,1.557970 ); SELECT * FROM #MinuteData; DROP TABLE #MinuteData;
这些值跟踪货币汇率,因此对于每分钟间隔(条形),分钟开始时有未平仓价格,分钟结束时有收盘价格.高值和低值表示每个分钟期间的最高和最低速率.
期望的输出
我想要将这些数据重新格式化为5分钟,以产生以下输出:
MinuteBar Open Close Low High 2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100 2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710
这取5的第一分钟的开放值,即5的最后一分钟的关闭值.高和低值表示5分钟时段内的最高和最低低速率.
当前解决方案
我有一个解决方案,这样做(下面),但它感觉不优雅,因为它依赖于id值和自连接.此外,我打算在更大的数据集上运行它,所以我希望在可能的情况下以更有效的方式执行它:
-- Create a column to allow grouping in 5 minute Intervals SELECT Id,MinuteBar,High,Low,[Close],DATEDIFF(MINUTE,'2015-01-01T00:00:00',MinuteBar)/5 AS Interval INTO #5MinuteData FROM #MinuteData ORDER BY minutebar -- Group by inteval and aggregate prior to self join SELECT Interval,MIN(MinuteBar) AS MinuteBar,MIN(Id) AS OpenId,MAX(Id) AS CloseId,MIN(Low) AS Low,MAX(High) AS High INTO #DataMinMax FROM #5MinuteData GROUP BY Interval; -- Self join to get the Open and Close values SELECT t1.Interval,t1.MinuteBar,tOpen.[Open],tClose.[Close],t1.Low,t1.High FROM #DataMinMax t1 INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId; DROP TABLE #DataMinMax DROP TABLE #5MinuteData
返工尝试
而不是上面的查询,我一直在寻找使用FIRST_VALUE和LAST_VALUE,因为它似乎是我所追求的,但我无法让它与我正在进行的分组工作.可能有比我正在尝试做的更好的解决方案,所以我愿意接受建议.目前我正在尝试这样做:
SELECT MIN(MinuteBar) MinuteBar5,FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS opening,MAX(High) AS High,LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing,'2015-01-01 00:00:00',MinuteBar) / 5 AS Interval FROM #MinuteData GROUP BY DATEDIFF(MINUTE,MinuteBar) / 5
这给了我以下错误,如果删除这些行,则查询运行时会出现FIRST_VALUE和LAST_VALUE:
Column ‘#MinuteData.MinuteBar’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
解决方法
SELECT MIN(MinuteBar) AS MinuteBar5,opening,Closing,Interval FROM ( SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE,MinuteBar) / 5 ORDER BY MinuteBar) AS opening,FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE,MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,MinuteBar) / 5 AS Interval,* FROM #MinuteData ) AS T GROUP BY Interval,Closing
> FIRST_VALUE和LAST_VALUE是分析函数,它们在窗口或分区而不是组上工作.您可以单独运行嵌套查询并查看其结果.
> LAST_VALUE是当前窗口的最后一个值,在查询中未指定,默认窗口是从当前分区的第一行到当前行的行.您可以将FIRST_VALUE与deseeding order一起使用,也可以指定一个窗口
LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE,MinuteBar) / 5 ORDER BY MinuteBar ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,