这是pgBadger的输出:
Hour Count Duration Avg duration 00 9,990 10m3s 60ms <---ignore this hour 02 1 60ms 60ms <---ignore this hour 03 4,638 1m54s 24ms <---queries begin with table empty 04 30,991 55m49s 108ms <---first full hour of queries running 05 13,497 58m3s 258ms 06 9,904 58m32s 354ms 07 10,542 58m25s 332ms 08 8,599 58m42s 409ms 09 7,360 58m52s 479ms 10 6,661 58m57s 531ms 11 6,133 59m2s 577ms 12 5,601 59m6s 633ms 13 5,327 59m9s 666ms 14 4,964 59m12s 715ms 15 4,759 59m14s 746ms 16 4,531 59m17s 785ms 17 4,330 59m18s 821ms 18 939 13m16s 848ms
表结构如下所示:
CREATE TABLE "Parent" ( "ParentID" SERIAL PRIMARY KEY,"Details1" VARCHAR );
表“Parent”与表“Foo”具有一对多的关系:
CREATE TABLE "Foo" ( "FooID" SERIAL PRIMARY KEY,"ParentID" int4 NOT NULL REFERENCES "Parent" ("ParentID"),"Details1" VARCHAR );
表“Foo”与表“Bar”具有一对多的关系:
CREATE TABLE "Bar" ( "FooID" int8 NOT NULL REFERENCES "Foo" ("FooID"),"Timerange" tstzrange NOT NULL,"Detail1" VARCHAR,"Detail2" VARCHAR,CONSTRAINT "Bar_pkey" PRIMARY KEY ("FooID","Timerange") ); CREATE INDEX "Bar_FooID_Timerange_idx" ON "Bar" USING gist("FooID","Timerange");
另外,表“Bar”可能不包含相同“FooID”或“ParentID”的重叠“Timespan”值.我创建了一个触发器,在任何INSERT,UPDATE或DELETE之后触发,以防止重叠范围.
触发器包含一个看起来类似于此的部分:
WITH "cte" AS ( SELECT "Foo"."FooID","Foo"."ParentID","Foo"."Details1","Bar"."Timespan" FROM "Foo" JOIN "Bar" ON "Foo"."FooID" = "Bar"."FooID" WHERE "Foo"."FooID" = 1234 ) SELECT "Foo"."FooID","Bar"."Timespan" FROM "cte" JOIN "Foo" ON "cte"."ParentID" = "Foo"."ParentID" AND "cte"."FooID" <> "Foo"."FooID" JOIN "Bar" ON "Foo"."FooID" = "Bar"."FooID" AND "cte"."Timespan" && "Bar"."Timespan";
EXPLAIN ANALYZE的结果:
Nested Loop (cost=7258.08..15540.26 rows=1 width=130) (actual time=8.052..147.792 rows=1 loops=1) Join Filter: ((cte."FooID" <> "Foo"."FooID") AND (cte."ParentID" = "Foo"."ParentID")) Rows Removed by Join Filter: 76 CTE cte -> Nested Loop (cost=0.68..7257.25 rows=1000 width=160) (actual time=1.727..1.735 rows=1 loops=1) -> Function Scan on "fn_Bar" (cost=0.25..10.25 rows=1000 width=104) (actual time=1.699..1.701 rows=1 loops=1) -> Index Scan using "Foo_pkey" on "Foo" "Foo_1" (cost=0.42..7.24 rows=1 width=64) (actual time=0.023..0.025 rows=1 loops=1) Index Cond: ("FooID" = "fn_Bar"."FooID") -> Nested Loop (cost=0.41..8256.00 rows=50 width=86) (actual time=1.828..147.188 rows=77 loops=1) -> CTE Scan on cte (cost=0.00..20.00 rows=1000 width=108) (actual time=1.730..1.740 rows=1 loops=1) **** -> Index Scan using "Bar_FooID_Timerange_idx" on "Bar" (cost=0.41..8.23 rows=1 width=74) (actual time=0.093..145.314 rows=77 loops=1) Index Cond: ((cte."Timespan" && "Timespan")) -> Index Scan using "Foo_pkey" on "Foo" (cost=0.42..0.53 rows=1 width=64) (actual time=0.004..0.005 rows=1 loops=77) Index Cond: ("FooID" = "Bar"."FooID") Planning time: 1.490 ms Execution time: 147.869 ms
(****强调我的)
这似乎表明99%的工作正在从“cte”到“Bar”(通过“Foo”)加入…但它已经使用了适当的索引……它仍然太慢了.
所以我跑了:
SELECT pg_size_pretty(pg_relation_size('"Bar"')) AS "Table",pg_size_pretty(pg_relation_size('"Bar_FooID_Timerange_idx"')) AS "Index";
结果:
Table | Index -------------|------------- 283 MB | 90 MB
这个大小的索引(相对于表格)在读取性能方面提供了多少?我正在考虑一个sudo-partition,其中索引被几个部分索引替换……也许部分将更少维护(和读取)并且性能会提高.我从未见过这样做,只是一个想法.如果这是一个选项,我想不出有任何好的方法来限制段,因为这将是一个TSTZRANGE值.
我还认为将“ParentID”添加到“Bar”会加快速度,但我不想反规范化.
我还有什么其他选择?
Erwin Brandstetter推荐的变更影响
在最高性能(18:00时),该过程每秒增加14.5条记录,从每秒1.15条记录增加.
这是以下结果:
>将“ParentID”添加到表“Bar”
>将外键约束添加到“Foo”(“ParentID”,“FooID”)
>使用gist添加EXCLUDE(“ParentID”WITH =,“Timerange”WITH&&)DEFERRABLE INITIALLY DEFERRED(已经安装了btree_gist模块)
Additionally,table
"Bar"
may not contain overlapping"Timespan"
values for the same"FooID"
or"ParentID"
. I have created a trigger
that fires after anyINSERT
,UPDATE
,orDELETE
that prevents
overlapping ranges.
我建议您使用排除约束,这样更简单,更安全,更快捷:
您需要先安装附加模块btree_gist
.请参阅相关答案中的说明和解释:
> Store the day of the week and time?
并且您需要在“Bar”表中冗余地包含“ParentID”,这将是一个很小的代价.表定义可能如下所示:
CREATE TABLE "Foo" ( "FooID" serial PRIMARY KEY "ParentID" int4 NOT NULL REFERENCES "Parent" "Details1" varchar CONSTRAINT foo_parent_foo_uni UNIQUE ("ParentID","FooID") -- required for FK ); CREATE TABLE "Bar" ( "ParentID" int4 NOT NULL,"FooID" int4 NOT NULL REFERENCES "Foo" ("FooID"),"Detail1" varchar,"Detail2" varchar,"Timerange"),CONSTRAINT bar_foo_fk FOREIGN KEY ("ParentID","FooID") REFERENCES "Foo" ("ParentID","FooID"),CONSTRAINT bar_parent_timerange_excl EXCLUDE USING gist ("ParentID" WITH =,"Timerange" WITH &&) );
我还将“Bar”.“FooID”的数据类型从int8更改为int4.它引用了“Foo”.“FooID”,它是一个序列,即int4.使用匹配类型int4(或只是整数)有几个原因,其中一个是性能.
您不再需要触发器(至少不需要此任务),并且您不再创建索引“Bar_FooID_Timerange_idx”,因为它是由排除约束隐式创建的.
btree索引(“ParentID”,“FooID”)最有可能是有用的,但是:
CREATE INDEX bar_parentid_fooid_idx ON "Bar" ("ParentID","FooID");
有关:
> Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL
我选择了UNIQUE(“ParentID”,“FooID”)而不是相反的原因,因为在任一个表中都有另一个带有前导“FooID”的索引:
> Is a composite index also good for queries on the first field?
旁白:I never use double-quoted CaMeL-case identifiers在Postgres.我这里只是为了遵守你的布局.
避免冗余列
如果你不能或不会包含“Bar”.冗余的“ParentID”,还有另一种流氓方式 – 条件是“Foo”.“ParentID”永远不会更新.例如,使用触发器确保这一点.
你可以伪造一个IMMUTABLE函数:
CREATE OR REPLACE FUNCTION f_parent_of_foo(int) RETURNS int AS 'SELECT "ParentID" FROM public."Foo" WHERE "FooID" = $1' LANGUAGE sql IMMUTABLE;
我对表名进行了模式限定,以确保公开.适应您的架构.
更多:
> CONSTRAINT to check values from a remotely related table (via join etc.)
> Does PostgreSQL support “accent insensitive” collations?
然后在排除约束中使用它:
CONSTRAINT bar_parent_timerange_excl EXCLUDE USING gist (f_parent_of_foo("FooID") WITH =,"Timerange" WITH &&)
在保存一个冗余的int4列时,验证的约束将更加昂贵,整个解决方案依赖于更多的前提条件.
处理冲突
您可以将INSERT和UPDATE包装到plpgsql函数中,并从排除约束(23P01 exclusion_violation)中捕获可能的异常以便以某种方式处理它.
INSERT ... EXCEPTION WHEN exclusion_violation THEN -- handle conflict
完整的代码示例:
> Handling EXCEPTION and return result from function
处理Postgres 9.5中的冲突
在Postgres 9.5中,您可以使用新的“UPSERT”实现直接处理INSERT. The documentation:
The optional
ON CONFLICT
clause specifies an alternative action to
raising a unique violation or exclusion constraint violation error.
For each individual row proposed for insertion,either the insertion
proceeds,or,if an arbiter constraint or index specified by
conflict_target
is violated,the alternativeconflict_action
is
taken.ON CONFLICT DO NOTHING
simply avoids inserting a row as its
alternative action.ON CONFLICT DO UPDATE
updates the existing row
that conflicts with the row proposed for insertion as its alternative action.
然而:
Note that exclusion constraints are not supported with
ON CONFLICT DO UPDATE
.
但是你仍然可以使用ON CONFLICT DO NOTHING,从而避免可能的exclusion_violation异常.只检查是否有任何行实际更新,哪个更便宜:
INSERT ... ON CONFLICT ON CONSTRAINT bar_parent_timerange_excl DO NOTHING; IF NOT FOUND THEN -- handle conflict END IF;
此示例将检查限制为给定的排除约束. (我在上面的表定义中为此目的明确命名了约束.)未捕获其他可能的异常.