例@H_502_3@
> SELECT to_tsvector('mortgag') @@ to_tsquery('simple','mortgage') ?column? ---------- f (1 row) > SELECT to_tsvector('mortgag') @@ to_tsquery('english','mortgage') ?column? ---------- t (1 row)
我会认为他们都应该返回真的,但显然第一个不是 – 为什么?@H_502_3@
解决方法
12.6. Dictionaries@H_502_3@
Dictionaries are used to eliminate words that should not be considered in a search (stop words),and to normalize words so that different derived forms of the same word will match. A successfully normalized word is called a lexeme.@H_502_3@
因此,字典被用来抛出在搜索(停止单词)中考虑太常见或无意义的东西,并使所有其他内容正常化,例如,即使它们是不同的单词,城市和城市也会匹配.@H_502_3@
让我们看一下从ts_debug
的一些输出,看看字典发生了什么:@H_502_3@
=> select * from ts_debug('english','mortgage'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+----------+----------------+--------------+----------- asciiword | Word,all ASCII | mortgage | {english_stem} | english_stem | {mortgag} => select * from ts_debug('simple','mortgage'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+----------+--------------+------------+------------ asciiword | Word,all ASCII | mortgage | {simple} | simple | {mortgage}
请注意,简单使用简单的字典,而英语使用english_stem字典.@H_502_3@
simple
dictionary:@H_502_3@
operates by converting the input token to lower case and checking it against a file of stop words. If it is found in the file then an empty array is returned,causing the token to be discarded. If not,the lower-cased form of the word is returned as the normalized lexeme.@H_502_3@
简单的字典只是抛出了停止的单词,下拉菜单,就是这样.我们可以看到自己的简单性:@H_502_3@
=> select to_tsquery('simple','Mortgage'),to_tsquery('simple','Mortgages'); to_tsquery | to_tsquery ------------+------------- 'mortgage' | 'mortgages'
简单的字典太简单,甚至不能处理简单的复数.@H_502_3@
那么这个english_stem字典是什么? “茎”后缀是一个放弃:这个字典应用一个词干算法来转换(例如)城市和城市到同一个字.从fine manual:@H_502_3@
12.6.6. Snowball Dictionary@H_502_3@
The Snowball dictionary template is based on a project by Martin Porter,inventor of the popular Porter’s stemming algorithm for the English language. […] Each algorithm understands how to reduce common variant forms of words to a base,or stem,spelling within its language.@H_502_3@
而在下面我们看到english_stem字典:@H_502_3@
06002@H_502_3@
所以,english_stem字典可以看出,@H_502_3@
=> select to_tsquery('english',to_tsquery('english','Mortgages'); to_tsquery | to_tsquery ------------+------------ 'mortgag' | 'mortgag'
执行摘要:“简单”意味着简单的头脑文字匹配,“英语”适用于(希望)产生更好的匹配.结果将抵押贷款转为抵押贷款,并为您提供了匹配.@H_502_3@