您好任何想法如何加快这个查询?
输入
EXPLAIN SELECT entityid FROM entity e LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' AND (entityid NOT IN (1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids) )
产量
Nested Loop (cost=0.00..1452373.79 rows=3865 width=8) -> Nested Loop (cost=0.00..8.58 rows=1 width=8) Join Filter: (l1.level2_level2id = l2.level2id) -> Seq Scan on level2entity l2 (cost=0.00..3.17 rows=1 width=8) Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text) -> Seq Scan on level1entity l1 (cost=0.00..4.07 rows=107 width=16) -> Index Scan using fk_fk18edb1cfb2a41235_idx on entity e (cost=0.00..1452086.09 rows=22329 width=16) Index Cond: (level1_level1id = l1.level1id)
好的这里是一个简化版本,连接不是瓶颈
SELECT enitityid FROM (SELECT enitityid FROM enitity e LIMIT 5000) a WHERE (enitityid NOT IN (1377776,... 50000 ids) )
问题是要找到没有任何这些ID的内容
说明
Subquery Scan on a (cost=0.00..312667.76 rows=1 width=8) Filter: (e.entityid <> ALL ('{1377776,... 50000 ids}'::bigint[])) -> Limit (cost=0.00..111.51 rows=5000 width=8) -> Seq Scan on entity e (cost=0.00..29015.26 rows=1301026 width=8)
解决方法
一个巨大的IN列表是非常低效的. Postgresql理想情况下应该识别它并将其转换为一个反连接的关系,但此时查询规划器不知道如何做到这一点,并且识别这种情况所需的规划时间将花费每个查询明智地使用NOT IN,所以它必须是一个非常低成本的检查.见
this earlier much more detailed answer on the topic.
正如大卫·阿尔德里奇所写的那样,最好通过将其变成反连接来解决.我把它写成VALUES列表中的连接只是因为Postgresql非常快速地将VALUES列表解析成关系,但效果是一样的:
SELECT entityid FROM entity e LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id LEFT OUTER JOIN ( VALUES (1377776),(1377792),(1377793),(1377794),(1377795),(1377796) ) ex(ex_entityid) ON (entityid = ex_entityid) WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' AND ex_entityid IS NULL;
对于足够大的值集,您甚至可以更好地创建临时表,将值复制到其中,在其上创建PRIMARY KEY,然后加入它.
探索的可能性更多: