我有一个填充了规范化地址的客户数据库.有重复.
每个用户都创建了自己的记录,并输入了自己的地址.因此,我们在用户和地址之间建立了一对一的关系:
CREATE TABLE `users` (
`UserID` INT UNSIGNED NOT NULL AUTO_INCREMENT,`Name` VARCHAR(63),`Email` VARCHAR(63),`AddressID` INT UNSIGNED,PRIMARY KEY (`UserID`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `addresses` (
`AddressID` INT UNSIGNED NOT NULL AUTO_INCREMENT,`Duplicate` VARCHAR(1),`Address1` VARCHAR(63) DEFAULT NULL,`Address2` VARCHAR(63) DEFAULT NULL,`City` VARCHAR(63) DEFAULT NULL,`State` VARCHAR(2) DEFAULT NULL,`ZIP` VARCHAR(10) DEFAULT NULL,PRIMARY KEY (`AddressID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
而数据:
INSERT INTO `users` VALUES
(1,'Michael','michael@email.com',1),(2,'Steve','steve@email.com',2),(3,'Judy','judy@email.com',3),(4,'Kathy','kathy@email.com',4),(5,'Mark','mark@email.com',5),(6,'Robert','robert@email.com',6),(7,'Susan','susan@email.com',7),(8,'Paul','paul@email.com',8),(9,'Patrick','patrick@email.com',9),(10,'Mary','mary@email.com',10),(11,'James','james@email.com',11),(12,'Barbara','barbara@email.com',12),(13,'Peter','peter@email.com',13);
INSERT INTO `addresses` VALUES
(1,'','1234 Main Street','Springfield','KS','54321'),'Y','5678 Sycamore Lane','Upstate','NY','50000'),'1000 State Street','Apt C','Sunnydale','OH','Apt A','1000 Main Street','9999 Valleyview','54321');
哦,是的,让我加上那个外键关系:
ALTER TABLE `users` ADD CONSTRAINT `AddressID`
FOREIGN KEY `AddressID` (`AddressID`)
REFERENCES `addresses` (`AddressID`);
我们通过第三方服务清理了我们的地址列表,该服务对数据进行了规范化并指出了我们重复的位置.这是Duplicate列的来源.如果有’Y’,则它是另一个地址的副本.主要地址未标记为重复,如示例数据中所示.
我显然想要删除所有重复记录,但有用户记录指向它们.我需要它们指向不重复的地址版本.
那么如何更新用户的AddressID以匹配非重复地址?
我能想到的唯一方法就是使用高级语言迭代所有数据,但我很确定MysqL拥有以更好的方式做这样的事情所需的所有工具.
这是我尝试过的:
SELECT COUNT(*) as cnt,GROUP_CONCAT(AddressID ORDER BY AddressID) AS ids
FROM addresses
GROUP BY Address1,Address2,City,State,ZIP
HAVING cnt > 1;
+-----+--------------+
| cnt | ids |
+-----+--------------+
| 2 | 5,7 |
| 6 | 1,2,3,6,8,10 |
| 2 | 4,11 |
+-----+--------------+
3 rows in set (0.00 sec)
从那里,我可以遍历每个结果行并执行此操作:
UPDATE `users` SET `AddressID` = 1 WHERE `AddressID` IN (2,10);
但是必须有一个更好的MysqL方式,不应该吗?
一切都说完了,数据应该是这样的:
SELECT * FROM `users`;
+--------+---------+-------------------+-----------+
| UserID | Name | Email | AddressID |
+--------+---------+-------------------+-----------+
| 1 | Michael | michael@email.com | 1 |
| 2 | Steve | steve@email.com | 1 |
| 3 | Judy | judy@email.com | 1 |
| 4 | Kathy | kathy@email.com | 4 |
| 5 | Mark | mark@email.com | 5 |
| 6 | Robert | robert@email.com | 1 |
| 7 | Susan | susan@email.com | 5 |
| 8 | Paul | paul@email.com | 1 |
| 9 | Patrick | patrick@email.com | 9 |
| 10 | Mary | mary@email.com | 1 |
| 11 | James | james@email.com | 4 |
| 12 | Barbara | barbara@email.com | 1 |
| 13 | Peter | peter@email.com | 13 |
+--------+---------+-------------------+-----------+
13 rows in set (0.00 sec)
SELECT * FROM `addresses`;
+-----------+-----------+--------------------+----------+-------------+-------+-------+
| AddressID | Duplicate | Address1 | Address2 | City | State | ZIP |
+-----------+-----------+--------------------+----------+-------------+-------+-------+
| 1 | | 1234 Main Street | | Springfield | KS | 54321 |
| 4 | | 5678 Sycamore Lane | | Upstate | NY | 50000 |
| 5 | | 1000 State Street | Apt C | Sunnydale | OH | 54321 |
| 9 | | 1000 State Street | Apt A | Sunnydale | OH | 54321 |
| 13 | | 9999 Valleyview | | Springfield | KS | 54321 |
+-----------+-----------+--------------------+----------+-------------+-------+-------+
5 rows in set (0.00 sec)
救命?
最佳答案
要选择要查看的结果:
SELECT a.UserID,a.Name,a.Email,(
SELECT addressID
FROM addresses c
WHERE c.Address1 = b.Address1
AND c.Address2 = b.Address2
AND c.City = b.City
AND c.State = b.State
AND c.ZIP = b.ZIP
AND DUPLICATE != 'Y'
) as AddressID
FROM users a
JOIN addresses b
ON a.AddressID = b.AddressID
UPDATE users a
JOIN addresses b
ON a.AddressID = b.AddressID
SET a.addressID =
(
SELECT addressID
FROM addresses c
WHERE c.Address1 = b.Address1
AND c.Address2 = b.Address2
AND c.City = b.City
AND c.State = b.State
AND c.ZIP = b.ZIP
AND Duplicate != 'Y'
)
WHERE Duplicate = 'Y'
请注意,对于您提供的示例数据,#12 Barbara的ID在SELECT查询中为空,因为她的地址被标记为重复,而实际上它对于提供的列表是唯一的.它与地址1不匹配,如“它应该如何看”结果中所示.
编辑
为了处理不正确的重复标记,例如#12 Barbara,或者可能是其他未标记的重复标记,您可以跳过重复标记列检查并使用ORDER BY&对子查询进行LIMIT,以便它将返回第一个最低匹配地址ID,而不管重复标记如何:
UPDATE users a
JOIN addresses b
ON a.AddressID = b.AddressID
SET a.addressID =
(
SELECT addressID
FROM addresses c
WHERE c.Address1 = b.Address1
AND c.Address2 = b.Address2
AND c.City = b.City
AND c.State = b.State
AND c.ZIP = b.ZIP
ORDER BY c.addressID ASC
LIMIT 1
)