http://blogs.msdn.com/b/psssql/archive/2010/09/28/case-of-using-filtered-statistics.aspx
数据偏重,一个区域有0行,其余的都来自不同的区域.
以下是重现问题的整个代码
create table Region(id int,name nvarchar(100)) go create table Sales(id int,detail int) go create clustered index d1 on Region(id) go create index ix_Region_name on Region(name) go create statistics ix_Region_id_name on Region(id,name) go create clustered index ix_Sales_id_detail on Sales(id,detail) go -- only two values in this table as lookup or dim table insert Region values(0,'Dallas') insert Region values(1,'New York') go set nocount on -- Sales is skewed insert Sales values(0,0) declare @i int set @i = 1 while @i <= 1000 begin insert Sales values (1,@i) set @i = @i + 1 end go update statistics Region with fullscan update statistics Sales with fullscan go set statistics profile on go --note that this query will over estimate -- it estimate there will be 500.5 rows select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) --this query will under estimate -- this query will also estimate 500.5 rows in fact 1000 rows returned select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) go set statistics profile off go create statistics Region_stats_id on Region (id) where name = 'Dallas' go create statistics Region_stats_id2 on Region (id) where name = 'New York' go set statistics profile on go --now the estimate becomes accurate (1 row) because select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) --the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) go set statistics profile off
我的问题是我们有两个表上可用的统计信息
sp_helpstats 'region','all' sp_helpstats 'sales','all'
表格区域:
statistics_name statistics_keys d1 id ix_Region_id_name id,name ix_Region_name name
桌面销售:
statistics_name statistics_keys ix_Sales_id_detail id,detail
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) --the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile)
2.当我根据作者创建过滤的统计信息时,我可以正确地看到估计,但为什么我们需要创建过滤的统计信息,我怎么说我需要筛选统计信息,因为即使我创建简单的统计信息,我得到相同的结果.
最好我遇到了这么远
奇怪的tripp倾斜统计视频
技术统计白皮书
但是仍然无法理解为什么筛选的统计数据在这里有所不同
提前致谢.
更新:7/4
马丁和詹姆斯之后回答问题:
有什么办法可以避免数据偏移
除了kimberely脚本之外,还有一种方法可以计算一个值的行数.
2.您有遇到任何与您的经验数据偏差有关的问题.我认为这取决于大桌子.但是我正在寻找一些详细的答案
3.我们必须花费sql的成本来扫描表格,以及一些阻塞,有时候会在触发更新stats的时候出现一个查询.在维护统计信息时,您会看到除此之外的开销.
再次感谢
解决方法
如果运行dbcc show_statistics(‘Region’,’ix_Region_id_name’),结果将是:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS 0 0 1 0 1 1 0 1 0 1
但是当您创建统计信息Region_stats_id(对于达拉斯),dbcc show_statistics(‘Region’,’Region_stats_id’)将显示:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS 0 0 1 0 1
所以sql Server知道只有1行,它的ID为0.
类似Region_stats_id2:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS 1 0 1 0 1
销售中的行数在ix_Sales_id_detail中将有助于确定每个ID的行数:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS 0 0 1 0 1 1 0 1000 0 1
信息:现在是由@MartijnPieters删除的答案的副本,因为这是我打算回答的问题 – 我似乎无法对删除的答案做任何事情.我不小心把这首先写到了今天的TheGameiswar的其他统计问题上,但是我已经删除了自己.