例
> SELECT to_tsvector('mortgag') @@ to_tsquery('simple','mortgage') ?column? ---------- f (1 row) > SELECT to_tsvector('mortgag') @@ to_tsquery('english','mortgage') ?column? ---------- t (1 row)
我会认为他们都应该返回真的,但显然第一个不是 – 为什么?
解决方法
12.6. Dictionaries
Dictionaries are used to eliminate words that should not be considered in a search (stop words),and to normalize words so that different derived forms of the same word will match. A successfully normalized word is called a lexeme.
因此,字典被用来抛出在搜索(停止单词)中考虑太常见或无意义的东西,并使所有其他内容正常化,例如,即使它们是不同的单词,城市和城市也会匹配.
让我们看一下从ts_debug
的一些输出,看看字典发生了什么:
=> select * from ts_debug('english','mortgage'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+----------+----------------+--------------+----------- asciiword | Word,all ASCII | mortgage | {english_stem} | english_stem | {mortgag} => select * from ts_debug('simple','mortgage'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+----------+--------------+------------+------------ asciiword | Word,all ASCII | mortgage | {simple} | simple | {mortgage}
请注意,简单使用简单的字典,而英语使用english_stem字典.
operates by converting the input token to lower case and checking it against a file of stop words. If it is found in the file then an empty array is returned,causing the token to be discarded. If not,the lower-cased form of the word is returned as the normalized lexeme.
简单的字典只是抛出了停止的单词,下拉菜单,就是这样.我们可以看到自己的简单性:
=> select to_tsquery('simple','Mortgage'),to_tsquery('simple','Mortgages'); to_tsquery | to_tsquery ------------+------------- 'mortgage' | 'mortgages'
简单的字典太简单,甚至不能处理简单的复数.
那么这个english_stem字典是什么? “茎”后缀是一个放弃:这个字典应用一个词干算法来转换(例如)城市和城市到同一个字.从fine manual:
12.6.6. Snowball Dictionary
The Snowball dictionary template is based on a project by Martin Porter,inventor of the popular Porter’s stemming algorithm for the English language. […] Each algorithm understands how to reduce common variant forms of words to a base,or stem,spelling within its language.
而在下面我们看到english_stem字典:
06002
所以,english_stem字典可以看出,
=> select to_tsquery('english',to_tsquery('english','Mortgages'); to_tsquery | to_tsquery ------------+------------ 'mortgag' | 'mortgag'
执行摘要:“简单”意味着简单的头脑文字匹配,“英语”适用于(希望)产生更好的匹配.结果将抵押贷款转为抵押贷款,并为您提供了匹配.