我计划提出另一个问题来解决这个问题,但现在我正在专门讲述如何阅读这些类型的计划。请不要指向任何通用教程,除非专门解决这个问题,在查询计划下方突出显示。
QUERY PLAN -------------------------------------------------------------------------------------------- Nested Loop Anti Join (cost=47680.88..169413.12 rows=1 width=77) Join Filter: ((co.fkey_style = v.chrome_styleid) AND (co.name = o.name)) -> Nested Loop (cost=5301.58..31738.10 rows=1 width=81) -> Hash Join (cost=5301.58..29722.32 rows=229 width=40) Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text)) -> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36) Filter: (name IS NULL) -> Hash (cost=4547.33..4547.33 rows=36150 width=24) -> Seq Scan on vehicles iv (cost=0.00..4547.33 rows=36150 width=24) Filter: (date_sold IS NULL) -> Index Scan using options_pkey on options co (cost=0.00..8.79 rows=1 width=49) Index Cond: ((co.fkey_style = iv.chrome_styleid) AND (co.code = io.code)) -> Hash Join (cost=42379.30..137424.09 rows=16729 width=26) Hash Cond: ((v.lot_id = o.lot_id) AND ((v.vin)::text = (o.vin)::text)) -> Seq Scan on vehicles v (cost=0.00..4547.33 rows=65233 width=24) -> Hash (cost=20223.32..20223.32 rows=931332 width=44) -> Seq Scan on options o (cost=0.00..20223.32 rows=931332 width=44) (17 rows)
这个查询计划的问题 – 我相信我明白了 – 可能最好由RhodiumToad说(他肯定比这更好,所以我会打赌他的解释更好)irc://irc.freenode.net/#postgresql :
oh,that plan is potentially disasterous
the problem with that plan is that it’s running a hugely expensive hashjoin for each row
the problem is the rows=1 estimate from the other join and
the planner thinks it’s ok to put a hugely expensive query in the inner path of a nestloop where the outer path is estimated to return only one row.
since,obvIoUsly,by the planner’s estimate the expensive part will only be run once
but this has an obvIoUs tendency to really mess up in practice
the problem is that the planner believes its own estimates
ideally,the planner needs to know the difference between “estimated to return 1 row” and “not possible to return more than 1 row”
but it’s not at all clear how to incorporate that into the existing code
他继续说:
it can affect any join,but usually joins against subqueries are the most likely
现在,当我看到这个计划时,我注意到的第一件事是嵌套循环反加注,这个成本为169,413(我会坚持上限)。此反连接分为嵌套循环的结果,成本为31,738,而哈希加入的结果为137,424。现在,137,424,远远大于31,738,所以我知道问题是哈希加入。然后我继续EXPLAIN AALLYZE查询之外的哈希连接分段。它在7秒内执行我确实有(lot_id,vin)和(co.code和v.code)的索引 – 有。我单独禁用seq_scan和hashjoin,并注意速度增加不到2秒。不足以说明为什么一小时后没有进展。
但是,毕竟这完全错了!是的,这是查询的较慢部分,但是因为rows =“1”位(我认为它是在嵌套循环反加入)。有没有一个教程可以帮助我识别这些类型的问题。这是一个错误(缺乏能力)在计划者中误导行的数量?我应该如何看待这一点呢来得到铑芋做的同样的结论?
这是简单的行=“1”,应该触发我弄清楚这一点吗?
我在所有涉及的表上都运行了VACUUM FULL ANALYZE,这是Postgresql 8.4。
在您的例子中,您将首先:
-> Seq Scan on options io (cost=0.00..20223.32 rows=23004 width=36) Filter: (name IS NULL)
运行EXPLAIN ANALYZE SELECT * FROM options WHERE name IS NULL;并查看返回的行是否符合估计。 2关系通常不是问题,您正在尝试发现数量级的差异。
然后查看EXPLAIN ANALYZE SELECT * FROM车辆WHERE date_sold IS NULL;返回预期的行数。
然后上升一级到哈希连接:
-> Hash Join (cost=5301.58..29722.32 rows=229 width=40) Hash Cond: ((io.lot_id = iv.lot_id) AND ((io.vin)::text = (iv.vin)::text))
查看EXPLAIN ANALYZE SELECT * FROM车辆AS iv INNER JOIN选项io ON(io.lot_id = iv.lot_id)AND((io.vin):: text =(iv.vin):: text)WHERE iv.date_sold IS NULL AND io.name IS NULL;导致229行。
再一个级别增加INNER JOIN选项co ON(co.fkey_style = iv.chrome_styleid)AND(co.code = io.code),并且预期只返回一行。这可能是因为如果行的实际数目从1到100,那么遍历包含嵌套循环的内部循环的总成本估计就被关闭了100。
计划者正在制定的基本错误可能是,它预计加入公司的两个谓词是相互独立的,并且增加了他们的选择性。虽然实际上它们可能存在很大的相关性,但选择性更接近于MIN(s1,s2)而不是s1 * s2。