postgresql pg_trgm 对模糊查询的优化

前端之家收集整理的这篇文章主要介绍了postgresql pg_trgm 对模糊查询的优化前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。


pg_trgm是用来做相似度匹配的,在一些情况下也可以拿来代替全文检索做字符匹配。

从大量数据中通过字符串的匹配查找数据的关键是索引,对字符串的精确相等匹配,前缀匹配(like 'x%')和后缀匹配(like '%x')可以使用btree索引,对中缀匹配(like '%x%')和正则表达式匹配就可以用pg_trgm的索引了。


1.环境

Ubuntu 14.04

Postgresql 9.4.0


1000万随机产生的MD5数据的前8个字符。

postgres=#createtabletb(infotext);
CREATETABLE
postgres=#insertintotbselectsubstring(md5(random()::text),1,8)fromgenerate_series(1,10000000);
INSERT010000000
shop_5>SELECT*fromtbwhereinfo~'4c589'limit1;
+----------+
|info|
|----------|
|a64c5891|
+----------+
SELECT1
Time:0.003s
shop_5>SELECT*fromtbwhereinfo~'4c589'limit20;
+----------+
|info|
|----------|
|a64c5891|
|5d4c5899|
|b6b4c589|
|984c589a|
|省略数据|
|8d4c5894|
|c0e4c589|
|974c589b|
+----------+
SELECT20
Time:4.035s#直接模糊查询需要4s
shop_5>CREATEINDEXtb_info_gin_idxontbusinggin(infogin_trgm_ops);
CREATEINDEX
Time:54.878s
shop_5>SELECT*fromtbwhereinfo~'4c589'limit20;
+----------+
|info|
|----------|
|4c589ea6|
|a734c589|
|54c58961|
|4c589d5d|
|a64c5891|
|省略数据|
|a4c589b7|
|4c5894ea|
|b4c589f6|
|6d14c589|
|aa4c5897|
+----------+
SELECT20
Time:0.005s#加索引后,模糊查询只需要0.005s
shop_5>explainanalyzeSELECT*fromtbwhereinfo~'4c589'limit20;
+--------------------------------------------------------------------------------------
|QUERYPLAN
|--------------------------------------------------------------------------------------
|Limit(cost=51.75..123.23rows=20width=9)(actualtime=2.431..2.478rows=20loops=1
|->BitmapHeapScanontb(cost=51.75..3625.71rows=1000width=9)(actualtime=2.
|RecheckCond:((info)::text~'4c589'::text)
|HeapBlocks:exact=20
|->BitmapIndexScanontb_info_gin_idx(cost=0.00..51.50rows=1000width=0
|IndexCond:((info)::text~'4c589'::text)
|Planningtime:0.210ms
|Executiontime:2.503ms
+--------------------------------------------------------------------------------------
EXPLAIN
Time:0.005s
shop_5>\dtb;
+----------+-----------------------+-------------+
|Column|Type|Modifiers|
|----------+-----------------------+-------------|
|info|charactertext||
+----------+-----------------------+-------------+
Indexes:
"tb_info_gin_idx"gin(infogin_trgm_ops)
Time:0.004s
shop_5>SELECTpg_relation_size('tb_info_gin_idx');
+--------------------+
|pg_relation_size|
|--------------------|
|234635264|
+--------------------+
SELECT1
Time:0.001s
shop_5>
shop_5>SELECTpg_size_pretty(pg_table_size('tb'));
+------------------+
|pg_size_pretty|
|------------------|
|422MB|
+------------------+
SELECT1Time:0.001s
shop_5>

###############

这类场景中,gist 效率没有 gin 效率高。

除了like,gin_trgm索引还可以用在正则表达式匹配上。

参考:

http://blog.chinaunix.net/uid-20726500-id-4824895.html

https://yq.aliyun.com/articles/7444

原文链接:https://www.f2er.com/postgresql/194572.html

猜你在找的Postgre SQL相关文章