问题
我有我的搜索字词:
"Yellow large widgets"
我将这些术语分为3个单词:
1 = "Yellow"; 2 = "Large"; 2 = "Widgets";
然后我搜索:
SELECT * FROM widgets WHERE (description LIKE '%yellow%' OR description LIKE '%large%' OR description LIKE 'widgets') OR (title LIKE '%yellow%' OR title LIKE '%large%' OR title LIKE '%widgets%')
如何根据这些偏差订购结果?
>标题占主导地位,如果标题中出现任何术语,则应将其视为更重要
>出现次数,总出现率较高的结果应首先出现
理想的方法论
>计算描述中的出现次数.
>这里每次出现都值1分.
>计算标题中的出现次数.
>每个标题出现值5分.
>按点排序.
但我不知道在sql中从哪里开始这样做.
解决方法
好的,我们在临时表中有你的搜索词:
CREATE TABLE #SearchTerms (Term varchar(50) not null) insert into #SearchTerms (Term) select 'yellow' union all select 'large' union all select 'widgets'
让我们做一些愚蠢的事:
select widgets.ID,(LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) as Descscore (LEN(title) - LEN(REPLACE(title,''))) / LEN(Term) as Titlescore from widgets,#SearchTerms
所以现在我们可以对这些事件求和并加权:
select widgets.ID,SUM((LEN(description) - LEN(REPLACE(description,''))) / LEN(Term) + ((LEN(title) - LEN(REPLACE(title,''))) / LEN(Term) *5)) as Combinedscore from widgets,#SearchTerms group by Widgets.ID
如果我们需要对此做更多的事情,我建议将上面的内容放在子选择中
select w.*,Combinedscore from widgets.w inner join (select widgets.ID,''))) / LEN(Term) + ((LEN(title) - LEN(REPLACE(title,''))) / LEN(Term) *5)) as Combinedscore from widgets,#SearchTerms group by Widgets.ID ) t on w.ID = t.ID where Combinedscore > 0 order by Combinedscore desc
(请注意,我假设所有这些示例中都有一个ID列,但可以将其扩展为在窗口小部件表中定义PK所需的列数)
这里真正的技巧是计算更大的文本体中单词的出现次数,这可以通过以下方式完成:
(LEN(text) - LEN(text with each occurrence of term removed)) / LEN(term)