看来CHECKDB在检查我的一个数据库时正在读取用户数据库的事务日志文件 – 特别是一个使用内存中OLTP表的数据库.
这个数据库的CHECKDB仍然在合理的时间内完成,所以我只是对这种行为感到好奇;但它肯定是此实例上所有数据库的CHECKDB最长持续时间.
在查看Paul Randal的史诗“CHECKDB From Every Angle: Complete description of all CHECKDB stages,”时,我看到前sql 2005 CHECKDB用于读取日志以获得数据库的一致视图.但由于这是2016年,它使用an internal database snapshot.
但是,其中一个是prerequisites for snapshots:
The source database must not contain a MEMORY_OPTIMIZED_DATA filegroup
我的用户数据库中有一个这样的文件组,所以看起来快照不在桌面上.
If a snapshot cannot be created,or TABLOCK is specified,DBCC CHECKDB acquires locks to obtain the required consistency. In this case,an exclusive database lock is required to perform the allocation checks,and shared table locks are required to perform the table checks.
好的,我们正在进行数据库和表锁定而不是快照.但这仍然无法解释为什么必须阅读事务日志.什么赋予了什么?
我在下面提供了一个脚本来重现场景.它使用sys.dm_io_virtual_file_stats来标识日志文件读取.
请注意,大多数情况下它会读取一小部分日志(480 KB),但偶尔会读取更多(48.2 MB).在我的生产场景中,当我们运行CHECKDB时,它每天晚上都会读取大部分日志文件(2 GB文件中约1.3 GB).
这是我到目前为止使用脚本获得的输出示例:
collection_time num_of_reads num_of_bytes_read 2018-04-04 15:12:29.203 106 50545664
或这个:
collection_time num_of_reads num_of_bytes_read 2018-04-04 15:25:14.227 1 491520
如果我用常规表替换内存优化对象,输出如下所示:
collection_time num_of_reads num_of_bytes_read 2018-04-04 15:21:03.207 0 0
为什么CHECKDB读取日志文件?特别是,为什么偶尔会读取更大部分的日志文件?
这是实际的脚本:
-- let's have a fresh DB USE [master]; IF (DB_ID(N'LogFileRead_Test') IS NOT NULL) BEGIN ALTER DATABASE [LogFileRead_Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [LogFileRead_Test]; END GO CREATE DATABASE [LogFileRead_Test] GO ALTER DATABASE [LogFileRead_Test] MODIFY FILE ( NAME = LogFileRead_Test_log,SIZE = 128MB ); -- Hekaton-yeah,I want memory optimized data GO ALTER DATABASE [LogFileRead_Test] ADD FILEGROUP [LatencyTestInMemoryFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE [LogFileRead_Test] ADD FILE ( NAME = [LatencyTestInMemoryFile],FILENAME = 'C:\Program Files\Microsoft sql Server\MSsql13.sql2016\MSsql\DATA\LogFileRead_Test_SessionStateInMemoryFile' ) TO FILEGROUP [LatencyTestInMemoryFileGroup]; GO USE [LogFileRead_Test] GO CREATE TYPE [dbo].[InMemoryIdTable] AS TABLE ( [InMemoryId] NVARCHAR (88) COLLATE Latin1_General_100_BIN2 NOT NULL,PRIMARY KEY NONCLUSTERED HASH ([InMemoryId]) WITH (BUCKET_COUNT = 240)) WITH (MEMORY_OPTIMIZED = ON); GO CREATE TABLE [dbo].[InMemoryStuff] ( [InMemoryId] NVARCHAR (88) COLLATE Latin1_General_100_BIN2 NOT NULL,[Created] DATETIME2 (7) NOT NULL,CONSTRAINT [PK_InMemoryStuff_InMemoryId] PRIMARY KEY NONCLUSTERED HASH ([InMemoryId]) WITH (BUCKET_COUNT = 240) ) WITH (MEMORY_OPTIMIZED = ON); GO -- RBAR is the new black (we need some logs to read) declare @j int = 0; while @j < 100000 begin INSERT INTO [dbo].[InMemoryStuff](InMemoryId,Created) VALUES ('Description' + CAST(@j as varchar),GETDATE()); set @j = @j + 1; end -- grab a baseline of virtual file stats to be diff'd later select f.num_of_reads,f.num_of_bytes_read into #dm_io_virtual_file_stats from sys.dm_io_virtual_file_stats(default,default) f where database_id = db_id('LogFileRead_Test') and file_id = FILE_IDEX('LogFileRead_Test_log'); -- hands off my log file,CHECKDB! GO DBCC CHECKDB ([LogFileRead_Test]) WITH NO_INFOMSGS,ALL_ERRORMSGS,DATA_PURITY; -- grab the latest virtual file stats,and compare with the prevIoUs capture GO select f.num_of_reads,f.num_of_bytes_read into #checkdb_stats from sys.dm_io_virtual_file_stats(default,default) f where database_id = db_id('LogFileRead_Test') and file_id = FILE_IDEX('LogFileRead_Test_log'); select collection_time = GETDATE(),num_of_reads = - f.num_of_reads + t.num_of_reads,num_of_bytes_read = - f.num_of_bytes_read + t.num_of_bytes_read into #dm_io_virtual_file_stats_diff from #dm_io_virtual_file_stats f,#checkdb_stats t; drop table #checkdb_stats; drop table #dm_io_virtual_file_stats; -- CHECKDB ignored my comment select collection_time,num_of_reads,num_of_bytes_read from #dm_io_virtual_file_stats_diff d order by d.collection_time; drop table #dm_io_virtual_file_stats_diff; -- I was *not* raised in a barn USE [master]; ALTER DATABASE [LogFileRead_Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [LogFileRead_Test];
由于这个repro通常只生成1或106个日志文件读取,我以为我会用file_read和file_read_completed扩展事件会话挖掘到1.
name timestamp mode offset database_id file_id size duration file_read 2018-04-06 10:51:11.1098141 Contiguous 72704 9 2 0 NULL file_read_completed 2018-04-06 10:51:11.1113345 Contiguous 72704 9 2 491520 1
这里是关于这些偏移的上下文的VLF细节(DBCC LOGINFO()),例如:
RecoveryUnitId FileId FileSize StartOffset FSeqNo Status Parity CreateLSN 0 2 2031616 8192 34 2 64 0 0 2 2031616 2039808 35 2 64 0 0 2 2031616 4071424 36 2 64 0 0 2 2285568 6103040 37 2 64 0 0 2 15728640 8388608 38 2 64 34000000005200001 0 2 15728640 24117248 39 2 64 34000000005200001 0 2 15728640 39845888 40 2 64 34000000005200001 0 2 15728640 55574528 0 0 0 34000000005200001 0 2 15728640 71303168 0 0 0 34000000005200001 0 2 15728640 87031808 0 0 0 34000000005200001 0 2 15728640 102760448 0 0 0 34000000005200001 0 2 15728640 118489088 0 0 0 34000000005200001
那么,CHECKDB操作:
>开始将63 KB(64,512字节)读入第一个VLF,
>读取480 KB(491,520字节),和
>没有读取VLF的最后1441 KB(1,475,584字节)
我还捕获了callstacks,以防它们有用.
file_read callstack:
(00007ffd`999a0860) sqlmin!XesqlPkg::file_read::Publish+0x1dc | (00007ffd`999a0b40) sqlmin!XesqlPkg::file_read_enqueued::Publish (00007ffd`9a825e30) sqlmin!FireReadEvent+0x118 | (00007ffd`9a825f60) sqlmin!FireReadEnqueuedEvent (00007ffd`9980b500) sqlmin!FCB::AsyncRead+0x74d | (00007ffd`9980b800) sqlmin!FCB::AsyncReadInternal (00007ffd`9970e9d0) sqlmin!sqlServerLogMgr::LogBlockReadAheadAsync+0x6a6 | (00007ffd`9970ec00) sqlmin!LBH::Destuff (00007ffd`9970a6d0) sqlmin!LogConsumer::GetNextLogBlock+0x1591 | (00007ffd`9970ab70) sqlmin!LogPoolPrivateCacheBufferMgr::Lookup (00007ffd`9a9fcbd0) sqlmin!sqlServerLogIterForward::GetNext+0x258 | (00007ffd`9a9fd2d0) sqlmin!sqlServerLogIterForward::GetNextBlock (00007ffd`9aa417f0) sqlmin!sqlServerCOWLogIterForward::GetNext+0x2b | (00007ffd`9aa418c0) sqlmin!sqlServerCOWLogIterForward::StartScan (00007ffd`9aa64210) sqlmin!RecoveryMgr::AnalysisPass+0x83b | (00007ffd`9aa65100) sqlmin!RecoveryMgr::AnalyzeLogRecord (00007ffd`9aa5ed50) sqlmin!RecoveryMgr::PhysicalRedo+0x233 | (00007ffd`9aa5f790) sqlmin!RecoveryMgr::PhysicalCompletion (00007ffd`9aa7fd90) sqlmin!RecoveryUnit::PhysicalRecovery+0x358 | (00007ffd`9aa802c0) sqlmin!RecoveryUnit::CompletePhysical (00007ffd`9a538b90) sqlmin!StartupCoordinator::NotifyPhaseStart+0x3a | (00007ffd`9a538bf0) sqlmin!StartupCoordinator::NotifyPhaseEnd (00007ffd`9a80c430) sqlmin!DBTABLE::ReplicaCreateStartup+0x2f4 | (00007ffd`9a80c820) sqlmin!DBTABLE::RefreshPostRecovery (00007ffd`9a7ed0b0) sqlmin!DBMgr::SyncAndLinkReplicaRecoveryPhase+0x890 | (00007ffd`9a7edff0) sqlmin!DBMgr::DetachDB (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica+0x869 | (00007ffd`9a7f3630) sqlmin!DBMgr::StrandTransientReplica (00007ffd`9a7f2ae0) sqlmin!DBMgr::CreateTransientReplica+0x118 | (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica (00007ffd`99ec6d30) sqlmin!DBDDLAgent::CreateReplica+0x1b5 | (00007ffd`99ec6f90) sqlmin!FSystemDatabase (00007ffd`9abaaeb0) sqlmin!UtilDbccCreateReplica+0x82 | (00007ffd`9abab000) sqlmin!UtilDbccDestroyReplica (00007ffd`9ab0d7e0) sqlmin!UtilDbccCheckDatabase+0x994 | (00007ffd`9ab0ffd0) sqlmin!UtilDbccRetainReplica (00007ffd`9ab0cfc0) sqlmin!DbccCheckDB+0x22d | (00007ffd`9ab0d380) sqlmin!DbccCheckFilegroup (00007ffd`777379c0) sqllang!DbccCommand::Execute+0x193 | (00007ffd`77737d70) sqllang!DbccHelp (00007ffd`777e58d0) sqllang!CStmtDbcc::XretExecute+0x889 | (00007ffd`777e6250) sqllang!UtilDbccSetPermissionFailure (00007ffd`76b02eb0) sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x40d | (00007ffd`76b03410) sqllang!CsqlSource::CleanupCompileXactState (00007ffd`76b03a60) sqllang!CMsqlExecContext::FExecute+0xa9e | (00007ffd`76b043d0) sqllang!CCacheObject::Release (00007ffd`76b03430) sqllang!CsqlSource::Execute+0x981 | (00007ffd`76b039b0) sqllang!CsqlLock::Cleanup
file_read_completed callstack:
(00007ffd`99995cc0) sqlmin!XesqlPkg::file_read_completed::Publish+0x1fc | (00007ffd`99995fe0) sqlmin!XesqlPkg::file_write_completed::Publish (00007ffd`9a826630) sqlmin!FireIoCompletionEventLong+0x227 | (00007ffd`9a8269c0) sqlmin!IoRequestDispenser::Dump (00007ffd`9969bee0) sqlmin!FCB::IoCompletion+0x8e | (00007ffd`9969c180) sqlmin!IoRequestDispenser::Put (00007ffd`beaa11e0) sqldk!IOQueue::CheckForIOCompletion+0x426 | (00007ffd`beaa1240) sqldk!SystemThread::GetCurrentId (00007ffd`beaa15b0) sqldk!SOS_Scheduler::SwitchContext+0x173 | (00007ffd`beaa18a0) sqldk!SOS_Scheduler::Switch (00007ffd`beaa1d00) sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd3 | (00007ffd`beaa1db0) sqldk!SOS_Scheduler::ResumeNoCuzz (00007ffd`99641720) sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1e7 | (00007ffd`99641ae0) sqlmin!SOS_DispatcherPool<DispatcherWorkItem,DispatcherWorkItem,SOS_DispatcherQueue<DispatcherWorkItem,DispatcherWorkItem>,DispatcherPoolConfig,void * __ptr64>::GetDispatchers (00007ffd`9aa437c0) sqlmin!sqlServerLogMgr::CheckLogBlockReadComplete+0x1e6 | (00007ffd`9aa44670) sqlmin!sqlServerLogMgr::ValidateBlock (00007ffd`9970a6d0) sqlmin!LogConsumer::GetNextLogBlock+0x1b37 | (00007ffd`9970ab70) sqlmin!LogPoolPrivateCacheBufferMgr::Lookup (00007ffd`9a9fcbd0) sqlmin!sqlServerLogIterForward::GetNext+0x258 | (00007ffd`9a9fd2d0) sqlmin!sqlServerLogIterForward::GetNextBlock (00007ffd`9aa417f0) sqlmin!sqlServerCOWLogIterForward::GetNext+0x2b | (00007ffd`9aa418c0) sqlmin!sqlServerCOWLogIterForward::StartScan (00007ffd`9aa64210) sqlmin!RecoveryMgr::AnalysisPass+0x83b | (00007ffd`9aa65100) sqlmin!RecoveryMgr::AnalyzeLogRecord (00007ffd`9aa5ed50) sqlmin!RecoveryMgr::PhysicalRedo+0x233 | (00007ffd`9aa5f790) sqlmin!RecoveryMgr::PhysicalCompletion (00007ffd`9aa7fd90) sqlmin!RecoveryUnit::PhysicalRecovery+0x358 | (00007ffd`9aa802c0) sqlmin!RecoveryUnit::CompletePhysical (00007ffd`9a538b90) sqlmin!StartupCoordinator::NotifyPhaseStart+0x3a | (00007ffd`9a538bf0) sqlmin!StartupCoordinator::NotifyPhaseEnd (00007ffd`9a80c430) sqlmin!DBTABLE::ReplicaCreateStartup+0x2f4 | (00007ffd`9a80c820) sqlmin!DBTABLE::RefreshPostRecovery (00007ffd`9a7ed0b0) sqlmin!DBMgr::SyncAndLinkReplicaRecoveryPhase+0x890 | (00007ffd`9a7edff0) sqlmin!DBMgr::DetachDB (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica+0x869 | (00007ffd`9a7f3630) sqlmin!DBMgr::StrandTransientReplica (00007ffd`9a7f2ae0) sqlmin!DBMgr::CreateTransientReplica+0x118 | (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica (00007ffd`99ec6d30) sqlmin!DBDDLAgent::CreateReplica+0x1b5 | (00007ffd`99ec6f90) sqlmin!FSystemDatabase (00007ffd`9abaaeb0) sqlmin!UtilDbccCreateReplica+0x82 | (00007ffd`9abab000) sqlmin!UtilDbccDestroyReplica (00007ffd`9ab0d7e0) sqlmin!UtilDbccCheckDatabase+0x994 | (00007ffd`9ab0ffd0) sqlmin!UtilDbccRetainReplica (00007ffd`9ab0cfc0) sqlmin!DbccCheckDB+0x22d | (00007ffd`9ab0d380) sqlmin!DbccCheckFilegroup (00007ffd`777379c0) sqllang!DbccCommand::Execute+0x193 | (00007ffd`77737d70) sqllang!DbccHelp
这些堆栈跟踪与Max’s answer相关联,表明尽管存在Hekaton表,CHECKDB仍在使用内部快照.
我已经读过快照执行恢复到undo uncommitted transactions:
Uncommitted transactions are rolled back in a newly created database snapshot because the Database Engine runs recovery after the snapshot has been created (transactions in the database are not affected).
但是这仍然无法解释为什么在我的生产场景中经常会读取大块日志文件(偶尔也会在此处提供的repro中读取).我不认为我的应用程序在给定时间内有那么多的飞行中交易,而且这里的repro肯定没有.
解决方法
据推测,微软选择在具有内存中表的数据库上防止用户创建的快照,因为他们需要复制内存中的结构,以使快照真正成为正常的,以用户为中心的完整快照.复制快照的内存表可能很容易使服务器内存不足,这不是A Good Thing™
您可以通过在运行DBCC CHECKDB时观察主数据库数据文件所在的数据文件夹来自行证明正在创建内部DBCC快照.如果创建了内部快照,您将看到名为LogFileRead_Test.mdf_MSsql_DBCC7的文件(7可能不同 – 它表示数据库的数据库ID).
创建快照文件后,sql Server必须在数据库上运行恢复,以使其进入DBCC CHECKDB运行所需的一致状态.您看到的任何日志读取操作都可能是该恢复过程的结果.我构建了一个快速装备来检查多个DBCC CHECKDB操作的输出,这证明如果checkdbs之间没有事务,则没有日志文件读取.
USE master; SET IMPLICIT_TRANSACTIONS OFF; USE [master]; IF (DB_ID(N'LogFileRead_Test') IS NOT NULL) BEGIN ALTER DATABASE [LogFileRead_Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [LogFileRead_Test]; END CREATE DATABASE [LogFileRead_Test] ALTER DATABASE [LogFileRead_Test] MODIFY FILE ( NAME = LogFileRead_Test_log,SIZE = 128MB ); ALTER DATABASE [LogFileRead_Test] ADD FILEGROUP [LatencyTestInMemoryFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA; ALTER DATABASE [LogFileRead_Test] ADD FILE ( NAME = [LatencyTestInMemoryFile],FILENAME = 'C:\temp\LogFileRead_Test_SessionStateInMemoryFile' ) TO FILEGROUP [LatencyTestInMemoryFileGroup]; GO USE LogFileRead_Test; CREATE TABLE [dbo].[InMemoryStuff] ( [InMemoryId] NVARCHAR (88) COLLATE Latin1_General_100_BIN2 NOT NULL,CONSTRAINT [PK_InMemoryStuff_InMemoryId] PRIMARY KEY NONCLUSTERED HASH ([InMemoryId]) WITH (BUCKET_COUNT = 240) ) WITH (MEMORY_OPTIMIZED = ON); ;WITH src AS ( SELECT n.Num FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9))n(Num) ) INSERT INTO [dbo].[InMemoryStuff] (InMemoryId,Created) SELECT 'Description' + CONVERT(varchar(30),((s1.Num * 10000) + (s2.Num * 1000) + (s3.Num * 100) + (s4.Num * 10) + (s5.Num))),GETDATE() FROM src s1 CROSS JOIN src s2 CROSS JOIN src s3 CROSS JOIN src s4 CROSS JOIN src s5; USE master; DECLARE @cmd nvarchar(max); DECLARE @msg nvarchar(1000); DECLARE @l int; DECLARE @m int; SET @m = 10; SET @l = 1; IF OBJECT_ID(N'tempdb..#vfs',N'U') IS NOT NULL DROP TABLE #vfs; CREATE TABLE #vfs ( vfs_run int NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,collection_time datetime2(7),num_of_reads bigint,num_of_bytes_read bigint ); WHILE @l <= @m BEGIN SET @msg = N'loop ' + CONVERT(nvarchar(10),@l); RAISERROR (@msg,1) WITH NOWAIT; SET @cmd = 'USE [LogFileRead_Test]; -- grab a baseline of virtual file stats to be diff''d later select f.num_of_reads,default) f where database_id = db_id(''LogFileRead_Test'') and file_id = FILE_IDEX(''LogFileRead_Test_log''); DBCC CHECKDB ([LogFileRead_Test]) WITH NO_INFOMSGS,and compare with the prevIoUs capture select f.num_of_reads,default) f where database_id = db_id(''LogFileRead_Test'') and file_id = FILE_IDEX(''LogFileRead_Test_log''); select collection_time = GETDATE(),#checkdb_stats t; --drop table #checkdb_stats; --drop table #dm_io_virtual_file_stats; -- CHECKDB ignored my comment select collection_time,num_of_bytes_read from #dm_io_virtual_file_stats_diff d order by d.collection_time; --drop table #dm_io_virtual_file_stats_diff; '; INSERT INTO #vfs (collection_time,num_of_bytes_read) EXEC sys.sp_executesql @cmd; SET @l += 1; END USE master; SET @cmd = 'USE [master]; ALTER DATABASE [LogFileRead_Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [LogFileRead_Test]; '; EXEC sys.sp_executesql @cmd; SELECT * FROM #vfs ORDER BY vfs_run;
结果:
╔═════════╦═════════════════════════════╦══════════════╦═══════════════════╗ ║ vfs_run ║ collection_time ║ num_of_reads ║ num_of_bytes_read ║ ╠═════════╬═════════════════════════════╬══════════════╬═══════════════════╣ ║ 1 ║ 2018-04-06 15:53:37.6566667 ║ 1 ║ 491520 ║ ║ 2 ║ 2018-04-06 15:53:37.8300000 ║ 0 ║ 0 ║ ║ 3 ║ 2018-04-06 15:53:38.0166667 ║ 0 ║ 0 ║ ║ 4 ║ 2018-04-06 15:53:38.1866667 ║ 0 ║ 0 ║ ║ 5 ║ 2018-04-06 15:53:38.3766667 ║ 0 ║ 0 ║ ║ 6 ║ 2018-04-06 15:53:38.5633333 ║ 0 ║ 0 ║ ║ 7 ║ 2018-04-06 15:53:38.7333333 ║ 0 ║ 0 ║ ║ 8 ║ 2018-04-06 15:53:38.9066667 ║ 0 ║ 0 ║ ║ 9 ║ 2018-04-06 15:53:39.0933333 ║ 0 ║ 0 ║ ║ 10 ║ 2018-04-06 15:53:39.2800000 ║ 0 ║ 0 ║ ╚═════════╩═════════════════════════════╩══════════════╩═══════════════════╝
此外,您可能希望使用简单的基于集合的方法,而不是使用RBAR方法将数据插入到测试表中,如下所示:
;WITH src AS ( SELECT n.Num FROM (VALUES (0),((s1.Num * 10000) + (s2.Num * 1000) + (s3.Num * 100) + (s4.Num * 10) + (s5.Num))),GETDATE() FROM src s1 CROSS JOIN src s2 CROSS JOIN src s3 CROSS JOIN src s4 CROSS JOIN src s5;