sql – 使用包含具有空值的列的WHERE子句的Update语句

我正在使用另一个表中的数据更新一个表上的列. WHERE子句基于多个列,一些列为空.从我的想法来看,这个null是什么抛出你的标准UPDATE TABLE SET X = Y WHERE A = B语句.

请参阅我们根据table_two的数据更新table_one的两个表中的this SQL Fiddle.
我的查询目前看起来像这样：

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
table_one.invoice_number = table_two.invoice_number AND
table_one.submitted_by = table_two.submitted_by AND
table_one.passport_number = table_two.passport_number AND
table_one.driving_license_number = table_two.driving_license_number AND
table_one.national_id_number = table_two.national_id_number AND
table_one.tax_pin_identification_number = table_two.tax_pin_identification_number AND
table_one.vat_number = table_two.vat_number AND
table_one.ggcg_number = table_two.ggcg_number AND
table_one.national_association_number = table_two.national_association_number

该表中的某些行的查询失败,表中的任何列为空时,table_one.x未更新.即只有当所有列都有一些数据时才会更新.

这个问题与我之前的一个here on SO有关,我从使用Distinct On的大型数据集中获取不同的值.现在我想要的是使用具有唯一字段的表格中的值来填充大数据集.

UPDATE

我使用了@binotenary提供的第一个更新语句.对于小桌子,它会闪光.示例有一个具有20,000个记录的表,更新在20秒内完成.但是到目前为止,还有一个有900万条记录的表已经运行了20个小时！ EXPLAIN功能的输出见下方

Update on table_one  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
  ->  Nested Loop  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
    Join Filter: ((((my_update_statement_here))))
    ->  Seq Scan on table_one  (cost=0.00..610872.62 rows=9661262 width=1986)
    ->  Seq Scan on table_two  (cost=0.00..6051.98 rows=299998 width=148)

EXPLAIN ANALYZE选项也是永远的,所以我取消了它.

关于如何使这种类型的更新更快的任何想法？即使它意味着使用不同的更新语句,甚至使用自定义函数循环并执行更新.

解决方法

由于null = null计算结果为false,除了进行相等检查外,还需要检查两个字段是否都为空：

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    (table_one.invoice_number = table_two.invoice_number 
        OR (table_one.invoice_number is null AND table_two.invoice_number is null))
    AND
    (table_one.submitted_by = table_two.submitted_by 
        OR (table_one.submitted_by is null AND table_two.submitted_by is null))
    AND 
    -- etc

您也可以使用更可读的coalesce功能：

UPDATE table_one SET table_one.x = table_two.y 
FROM table_two
WHERE 
    coalesce(table_one.invoice_number,'') = coalesce(table_two.invoice_number,'')
    AND coalesce(table_one.submitted_by,'') = coalesce(table_two.submitted_by,'')
    AND -- etc

但是您需要注意默认值(最后一个参数合并).
它的数据类型应该与列类型匹配(以便您不会将日期与数字进行比较),默认值应该不会出现在数据中
E.g coalesce(null,1)= coalesce(1,1)是你想避免的情况.

更新(关于性能)：

在table_two上的Seq Scan – 这表示您没有table_two上的任何索引.
因此,如果您更新table_one中的一行,则在table_two中找到匹配的行,数据库基本上必须逐个扫描所有行,直到找到匹配项.
如果相关列被编入索引,可以找到匹配行的速度更快.

如果table_one有任何索引,那么这样做会减慢更新速度.
根据this performance guide：

Table constraints and indexes heavily delay every write. If possible,you should drop all the indexes,triggers and foreign keys while the update runs and recreate them at the end.

来自同一指南的另一个可能有帮助的建议是：

If you can segment your data using,for example,sequential IDs,you can update rows incrementally in batches.

所以例如,如果table_one是一个id列,你可以添加类似的东西

and table_one.id between x and y

到where条件并运行查询多次更改x和y的值,以便覆盖所有行.

The EXPLAIN ANALYZE option took also forever

在处理带有副作用的语句时,您可能需要注意在使用EXPLAIN时使用ANALYZE选项.
根据documentation：

Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return,other side effects of the statement will happen as usual.

sql – 使用包含具有空值的列的WHERE子句的Update语句

解决方法

猜你在找的MsSQL相关文章