It is important to specify only the columns from the target table that are used for matching purposes. That is,specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause,such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
但这正是我为了让我的MERGE工作所必须做的事情.
我拥有的数据是项目与类别的标准多对多连接表(例如,哪些项目包含在哪些类别中),如下所示:
CategoryId ItemId ========== ====== 1 1 1 2 1 3 2 1 2 3 3 5 3 6 4 5
我需要做的是用一个新的项目列表有效地替换特定类别中的所有行.我最初尝试这样做看起来像这样:
MERGE INTO CategoryItem AS TARGET USING ( SELECT ItemId FROM SomeExternalDataSource WHERE CategoryId = 2 ) AS SOURCE ON SOURCE.ItemId = TARGET.ItemId AND TARGET.CategoryId = 2 WHEN NOT MATCHED BY TARGET THEN INSERT ( CategoryId,ItemId ) VALUES ( 2,ItemId ) WHEN NOT MATCHED BY SOURCE AND TARGET.CategoryId = 2 THEN DELETE ;
这似乎是在我的测试中工作,但我正在做MSDN明确警告我不要做的事情.这让我担心以后会遇到意想不到的问题,但我看不到任何其他方法让我的MERGE只影响具有特定字段值的行(CategoryId = 2)并忽略其他类别的行.
是否有“更正确”的方法来实现同样的结果?什么是MSDN警告我的“意外或不正确的结果”?
解决方法
DECLARE @CategoryItem AS TABLE ( CategoryId integer NOT NULL,ItemId integer NOT NULL,PRIMARY KEY (CategoryId,ItemId),UNIQUE (ItemId,CategoryId) ); DECLARE @DataSource AS TABLE ( CategoryId integer NOT NULL,ItemId integer NOT NULL PRIMARY KEY (CategoryId,ItemId) ); INSERT @CategoryItem (CategoryId,ItemId) VALUES (1,1),(1,2),3),(2,(3,5),6),(4,5); INSERT @DataSource (CategoryId,ItemId) VALUES (2,2);
目标
╔════════════╦════════╗ ║ CategoryId ║ ItemId ║ ╠════════════╬════════╣ ║ 1 ║ 1 ║ ║ 2 ║ 1 ║ ║ 1 ║ 2 ║ ║ 1 ║ 3 ║ ║ 2 ║ 3 ║ ║ 3 ║ 5 ║ ║ 4 ║ 5 ║ ║ 3 ║ 6 ║ ╚════════════╩════════╝
资源
╔════════════╦════════╗ ║ CategoryId ║ ItemId ║ ╠════════════╬════════╣ ║ 2 ║ 2 ║ ╚════════════╩════════╝
期望的结果是用来自源的数据替换目标中的数据,但仅针对CategoryId = 2.按照上面给出的MERGE的描述,我们应该编写一个仅在键上连接源和目标的查询,并过滤行仅在WHEN条款中:
MERGE INTO @CategoryItem AS TARGET USING @DataSource AS SOURCE ON SOURCE.ItemId = TARGET.ItemId AND SOURCE.CategoryId = TARGET.CategoryId WHEN NOT MATCHED BY SOURCE AND TARGET.CategoryId = 2 THEN DELETE WHEN NOT MATCHED BY TARGET AND SOURCE.CategoryId = 2 THEN INSERT (CategoryId,ItemId) VALUES (CategoryId,ItemId) OUTPUT $ACTION,ISNULL(INSERTED.CategoryId,DELETED.CategoryId) AS CategoryId,ISNULL(INSERTED.ItemId,DELETED.ItemId) AS ItemId ;
这给出了以下结果:
╔═════════╦════════════╦════════╗ ║ $ACTION ║ CategoryId ║ ItemId ║ ╠═════════╬════════════╬════════╣ ║ DELETE ║ 2 ║ 1 ║ ║ INSERT ║ 2 ║ 2 ║ ║ DELETE ║ 2 ║ 3 ║ ╚═════════╩════════════╩════════╝ ╔════════════╦════════╗ ║ CategoryId ║ ItemId ║ ╠════════════╬════════╣ ║ 1 ║ 1 ║ ║ 1 ║ 2 ║ ║ 1 ║ 3 ║ ║ 2 ║ 2 ║ ║ 3 ║ 5 ║ ║ 3 ║ 6 ║ ║ 4 ║ 5 ║ ╚════════════╩════════╝
执行计划是:
请注意,两个表都已完全扫描.我们可能认为这样效率低,因为只有CategoryId = 2的行才会在目标表中受到影响.这就是联机丛书中的警告进入的地方.一个错误的尝试优化以仅触摸目标中的必要行:
MERGE INTO @CategoryItem AS TARGET USING ( SELECT CategoryId,ItemId FROM @DataSource AS ds WHERE CategoryId = 2 ) AS SOURCE ON SOURCE.ItemId = TARGET.ItemId AND TARGET.CategoryId = 2 WHEN NOT MATCHED BY TARGET THEN INSERT (CategoryId,ItemId) VALUES (CategoryId,ItemId) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $ACTION,DELETED.ItemId) AS ItemId ;
ON子句中的逻辑作为连接的一部分应用.在这种情况下,连接是完全外连接(有关原因,请参阅this Books Online entry).在目标行上应用类别2的检查作为外部联接的一部分最终会导致删除具有不同值的行(因为它们与源不匹配):
╔═════════╦════════════╦════════╗ ║ $ACTION ║ CategoryId ║ ItemId ║ ╠═════════╬════════════╬════════╣ ║ DELETE ║ 1 ║ 1 ║ ║ DELETE ║ 1 ║ 2 ║ ║ DELETE ║ 1 ║ 3 ║ ║ DELETE ║ 2 ║ 1 ║ ║ INSERT ║ 2 ║ 2 ║ ║ DELETE ║ 2 ║ 3 ║ ║ DELETE ║ 3 ║ 5 ║ ║ DELETE ║ 3 ║ 6 ║ ║ DELETE ║ 4 ║ 5 ║ ╚═════════╩════════════╩════════╝ ╔════════════╦════════╗ ║ CategoryId ║ ItemId ║ ╠════════════╬════════╣ ║ 2 ║ 2 ║ ╚════════════╩════════╝
根本原因与谓词在外连接ON子句中的行为方式相同,而不是在WHERE子句中指定的原因. MERGE语法(以及取决于指定的子句的连接实现)只是让人们更难以看到这是如此.
guidance in Books Online(在Optimizing Performance条目中扩展)提供了指导,确保使用MERGE语法表达正确的语义,而无需用户必须了解所有实现细节,或考虑优化器可能合法地重新排列要执行的内容的方式效率原因.
该文档提供了三种实现早期过滤的潜在方法:
在WHEN子句中指定过滤条件可以保证正确的结果,但可能意味着从源表和目标表中读取和处理的行数比严格必要的更多(如第一个示例所示).
通过包含过滤条件的视图进行更新也可以保证正确的结果(因为更改的行必须可以通过视图进行更新),但这确实需要专用视图,并且需要遵循奇怪的条件来更新视图.
使用公用表表达式会为ON子句添加谓词带来类似的风险,但原因略有不同.在许多情况下,这将是安全的,但它需要专家分析执行计划以确认这一点(以及广泛的实际测试).例如:
WITH TARGET AS ( SELECT * FROM @CategoryItem WHERE CategoryId = 2 ) MERGE INTO TARGET USING ( SELECT CategoryId,ItemId FROM @DataSource WHERE CategoryId = 2 ) AS SOURCE ON SOURCE.ItemId = TARGET.ItemId AND SOURCE.CategoryId = TARGET.CategoryId WHEN NOT MATCHED BY TARGET THEN INSERT (CategoryId,DELETED.ItemId) AS ItemId ;
通过更优化的计划,可以产生正确的结果(不重复):
该计划仅从目标表中读取类别2的行.如果目标表很大,这可能是一个重要的性能考虑因素,但使用MERGE语法很容易出错.
有时,将MERGE编写为单独的DML操作更容易.这种方法甚至可以比单一的MERGE表现更好,这一事实常常让人惊讶.
DELETE ci FROM @CategoryItem AS ci WHERE ci.CategoryId = 2 AND NOT EXISTS ( SELECT 1 FROM @DataSource AS ds WHERE ds.ItemId = ci.ItemId AND ds.CategoryId = ci.CategoryId ); INSERT @CategoryItem SELECT ds.CategoryId,ds.ItemId FROM @DataSource AS ds WHERE ds.CategoryId = 2;