由于卷,我已将删除分解为类似于此的循环:
-- Do not block if records are locked. SET LOCK_TIMEOUT 0 -- This process should be chosen as a deadlock victim in the case of a deadlock. SET DEADLOCK_PRIORITY LOW SET NOCOUNT ON DECLARE @Count SET @Count = 1 WHILE @Count > 0 BEGIN TRY BEGIN TRANSACTION -- added per comment below DELETE TOP (1000) FROM MyTable WITH (ROWLOCK,READPAST) WHERE MyField = SomeValue SET @Count == @@ROWCOUNT COMMIT END TRY BEGIN CATCH exec sp_lock -- added to display the open locks after the timeout exec sp_who2 -- shows the active processes IF @@TRANCOUNT > 0 ROLLBACK RETURN -- ignoring this error for brevity END CATCH
MyTable是一个集群表. MyField位于聚簇索引的第一列.它表示记录的逻辑分组,因此MyField = SomeValue经常选择许多记录.只要一次处理一个组,我不关心它们被删除的顺序.此表中没有其他索引.
我添加了ROWLOCK提示,试图避免我们在生产中看到的锁定升级.我添加了READPAST提示以避免删除其他进程锁定的记录.这应该永远不会发生,但我想保证安全.
问题:有时这个循环遇到锁定超时1222“超出锁定请求超时时间”,这是唯一运行的东西.
我很肯定在我测试这个过程时,这个系统上没有其他活动,因为它是我自己的开发人员框,没有其他人连接,没有其他进程在其上运行,并且探查器显示没有活动.
我可以在一秒钟之后重新运行相同的脚本,它会从中断的地方继续运行,愉快地删除记录 – 直到下一次锁定超时.
我尝试了BEGIN TRY / BEGIN CATCH来忽略1222错误并重试删除,但它会立即再次失败并出现相同的锁定超时错误.如果我在重试之前添加一个短暂的延迟,它也会再次失败.
我假设锁定超时是因为页面拆分之类的东西,但我不确定为什么这会与当前的循环迭代冲突.先前的删除语句应该已经完成,我认为这意味着任何页面拆分也已完成.
为什么DELETE循环对自己的锁定超时?
有没有办法让进程可以避免此锁定超时或检测到恢复是否安全?
这是在sql Server 2005上.
– 编辑 –
我将Lock:Timeout事件添加到了探查器中.在删除过程中,它在PAGELOCK上超时:
Event Class: Lock:Timeout TextData: 1:15634 (one example of several) Mode: 7 - IU Type: 6 - PAGE
DBCC PAGE报告这些页面超出了主数据库(ID 1)的范围.
– 编辑2 –
我添加了一个BEGIN TRY / BEGIN CATCH并在catch块中运行了一个exec sp_lock.这是我看到的:
spid dbid ObjId IndId Type Resource Mode Status 19 2 1401108082 1 PAG 1:52841 X GRANT (tempdb.dbo.MyTable) 19 2 1401108082 0 TAB IX GRANT (tempdb.dbo.MyTable) Me 2 1401108082 0 TAB IX GRANT (tempdb.dbo.MyTable) Me 1 1115151018 0 TAB IS GRANT (master..spt_values) (?)
解决方法
使用Nicholas的建议,我添加了一个BEGIN TRANSACTION和一个COMMIT.我将删除循环包含在BEGIN TRY / BEGIN CATCH中.在BEGIN CATCH中,就在ROLLBACK之前,我运行了sp_lock和sp_who2. (我在上面的问题中添加了代码更改.)
当我的进程被阻止时,我看到了以下输出:
spid dbid ObjId IndId Type Resource Mode Status ------ ------ ----------- ------ ---- -------------------------------- -------- ------ 20 2 1401108082 0 TAB IX GRANT 20 2 1401108082 1 PAG 1:102368 X GRANT SPID Status Login HostName BlkBy DBName Command cpuTime DiskIO ---- ---------- ----- -------- ----- ------ ------------- ------- ------ 20 BACKGROUND sa . . tempdb GHOST CLEANUP 31 0
为了将来参考,当sql Server删除记录时,它会在它们上设置一点,只是将它们标记为“ghost记录”.每隔几分钟,就会运行一个名为ghost cleanup的内部进程来回收已完全删除的记录页(即所有记录都是ghost记录).
The ghost cleanup process was discussed on ServerFault in this question.
Here is Paul S. Randal’s explanation of the ghost cleanup process.
It is possible to disable the ghost cleanup process with a trace flag.但在这种情况下我没有必要这样做.
我最终添加了一个100毫秒的锁等待超时.这会导致鬼记录清理过程中偶尔发生锁定等待超时,但这是可以接受的.我还添加了一个循环,重试锁定超时最多5次.通过这两项更改,我的流程现在通常会完成.现在只有在一个非常长的进程推送大量数据来获取表或页面锁定我的进程需要清理的数据时,它才会超时.
编辑2016-07-20
最终代码如下所示:
-- Do not block long if records are locked. SET LOCK_TIMEOUT 100 -- This process volunteers to be a deadlock victim in the case of a deadlock. SET DEADLOCK_PRIORITY LOW DECLARE @Error BIT SET @Error = 0 DECLARE @ErrMsg VARCHAR(1000) DECLARE @DeletedCount INT SELECT @DeletedCount = 0 DECLARE @LockTimeoutCount INT SET @LockTimeoutCount = 0 DECLARE @ContinueDeleting BIT,@LastDeleteSuccessful BIT SET @ContinueDeleting = 1 SET @LastDeleteSuccessful = 1 WHILE @ContinueDeleting = 1 BEGIN DECLARE @RowCount INT SET @RowCount = 0 BEGIN TRY BEGIN TRANSACTION -- The READPAST below attempts to skip over locked records. -- However,it might still cause a lock wait error (1222) if a page or index is locked,because the delete has to modify indexes. -- The threshold for row lock escalation to table locks is around 5,000 records,-- so keep the deleted number smaller than this limit in case we are deleting a large chunk of data. -- Table name,field,and value are all set dynamically in the actual script. SET @sql = N'DELETE TOP (1000) MyTable WITH(ROWLOCK,READPAST) WHERE MyField = SomeValue' EXEC sp_executesql @sql,N'@ProcGuid uniqueidentifier',@ProcGUID SET @RowCount = @@ROWCOUNT COMMIT SET @LastDeleteSuccessful = 1 SET @DeletedCount = @DeletedCount + @RowCount IF @RowCount = 0 BEGIN SET @ContinueDeleting = 0 END END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK IF Error_Number() = 1222 -- Lock timeout BEGIN IF @LastDeleteSuccessful = 1 BEGIN -- If we hit a lock timeout,and we had already deleted something successfully,try again. SET @LastDeleteSuccessful = 0 END ELSE BEGIN -- The last delete Failed,too. Give up for now. The job will run again shortly. SET @ContinueDeleting = 0 END END ELSE -- On anything other than a lock timeout,report an error. BEGIN SET @ErrMsg = 'An error occurred cleaning up data. Table: MyTable Column: MyColumn Value: SomeValue. Message: ' + ERROR_MESSAGE() + ' Error Number: ' + CONVERT(VARCHAR(20),ERROR_NUMBER()) + ' Line: ' + CONVERT(VARCHAR(20),ERROR_LINE()) PRINT @ErrMsg -- this error message will be included in the sql Server job history SET @Error = 1 SET @ContinueDeleting = 0 END END CATCH END IF @Error <> 0 RAISERROR('Not all data could be cleaned up. See prevIoUs messages.',16,1)