UPDATE TABLE_A SET COL_1=B.COL_1 FROM TABLE_A A INNER JOIN TABLE_B B ON A.KEY_1=B.KEY_1 WHERE B.COL_2 IS NOT NULL AND A.COL_1=91216599
错误地,我改为运行以下语句:
UPDATE TABLE_A SET COL_1=B.COL_1 FROM TABLE_A_COPY A INNER JOIN TABLE_B B ON A.KEY_1=B.KEY_1 WHERE B.COL_2 is not NULL AND A.COL_1=91216599
请注意,在第二个语句(错误的一个)中,FROM子句指定表TABLE_A_COPY而不是TABLE_A.两个表具有完全相同的模式(即,相同的列)和相同的数据(在任何UPDATE执行之前,即).
TABLE_A和TABLE_A_COPY都有大约1亿条记录,更新影响大约500,000条记录.第二个语句(错误的语句)运行了几个小时并且在第一个语句(正确的语句)运行40秒并且成功时失败.
显然,这两个语句在语法上都是正确的,但我不确定我究竟要求sql Server使用第一个语句.
我的问题是:
> sql Server在第二个语句中尝试做什么?由于我的错误,我没有指定TABLE_A到TABLE_A_COPY之间的记录之间的链接,所以它是否尝试在两者之间进行CROSS JOIN,然后更新TABLE_A中的每条记录数十亿次?
>如果要问的问题不是太宽泛,那么在FROM / JOIN子句中没有提到正在更新的表的UPDATE语句的有效方案是什么.为什么有人这样做?为什么sql Server甚至会允许这样做?
我确实尝试搜索我的问题的答案,但Google似乎认为我在询问UPDATE FROM语法.
解决方法
CREATE TABLE #TABLE_A(KEY_1 INT PRIMARY KEY,COL_1 INT); CREATE TABLE #TABLE_A_COPY(KEY_1 INT PRIMARY KEY,COL_1 INT); CREATE TABLE #TABLE_B(KEY_1 INT PRIMARY KEY,COL_1 INT,COL_2 INT); INSERT INTO #TABLE_A VALUES (1,91216599),(2,(3,(4,(5,(6,6); INSERT INTO #TABLE_A_COPY VALUES (1,6); INSERT INTO #TABLE_B VALUES (1,10,10),20,20),30,30); /* UPDATE #TABLE_A SET COL_1=B.COL_1 --SELECT * FROM #TABLE_A A INNER JOIN #TABLE_B B ON A.KEY_1=B.KEY_1 WHERE B.COL_2 IS NOT NULL AND A.COL_1=91216599; */ UPDATE #TABLE_A SET COL_1=B.COL_1 FROM #TABLE_A_COPY A INNER JOIN #TABLE_B B ON A.KEY_1=B.KEY_1 WHERE B.COL_2 is not NULL AND A.COL_1=91216599 SELECT * FROM #TABLE_A;
在上面的代码中检查TABLE_A记录如何更改KEY_1 = 6.
2)
sql Server UPDATE FROM / DELETE FROM语法比ANSI标准要广泛得多,您遇到的问题可以减少到多次更新同一行.使用UPDATE,您不会收到任何错误或警告:
从Let's deprecate UPDATE FROM!
和Deprecate UPDATE FROM and DELETE FROM
:
Correctness? Bah,who cares?
Well,most do. That’s why we test.
If I mess up the join criteria in a SELECT query so that too many rows
from the second table match,I’ll see it as soon as I test,because I
get more rows back then expected. If I mess up the subquery criteria
in an ANSI standard UPDATE query in a similar way,I see it even
sooner,because sql Server will return an error if the subquery
returns more than a single value. But with the proprietary UPDATE FROM
Syntax,I can mess up the join and never notice – sql Server will
happily update the same row over and over again if it matches more
than one row in the joined table,with only the result of the last of
those updates sticking. And there is no way of knowing which row that
will be,since that depends in the query execution plan that happens
to be chosen. A worst case scenario would be one where the execution
plan just happens to result in the expected outcome during all tests
on the single-processor development server – and then,after
deployment to the four-way dual-core production server,our precIoUs
data suddenly hits the fan…
如果您使用例如MERGE,您将收到错误消息,指出:
The MERGE statement attempted to UPDATE or DELETE the same row more
than once. This happens when a target row matches more than one source
row. A MERGE statement cannot UPDATE/DELETE the same row of the target
table multiple times. Refine the ON clause to ensure a target row
matches at most one source row,or use the GROUP BY clause to group
the source rows.
所以你需要更加小心并检查你的代码.我也希望得到错误,但正如你在连接链接中看到的那样,这不会发生.
避免这种情况的一种方法是使用UPDATE别名,这样您就可以确定使用参与FROM JOIN的表而不涉及其他表:
UPDATE A SET COL_1=B.COL_1 FROM #TABLE_A A INNER JOIN #TABLE_B B ON A.KEY_1=B.KEY_1 WHERE B.COL_2 IS NOT NULL AND A.COL_1=91216599;