我试图在一个
HTML块中突出显示搜索词,问题是如果用户搜索“颜色”,这个:
< span style ='color:white'>白色< / span>@H_403_3@
变为:
< span style ='< b> color< / b>:white’>< b> White< / b>< / span>@H_403_3@
显然,弄乱我的风格并不是一个好主意.@H_403_3@
Query parsedQuery = parser.Parse(luceneQuery); StandardAnalyzer Analyzer = new StandardAnalyzer(); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b class='search'>","</b>"); Queryscorer scorer = new Queryscorer(parsedQuery); Highlighter Highlighter = new Highlighter(formatter,scorer); Highlighter.SetTextFragmenter(new SimpleFragmenter()); Highlighter.GetBestFragment(Analyzer,propertyName,invocation.ReturnValue.ToString())
我猜测问题是我需要一个不同的Fragmenter,但我不确定.任何帮助,将不胜感激.@H_403_3@
解决方法
我想我想通了……
我将StandardAnalyzer子类化并将TokenStream更改为:@H_403_3@
public override Lucene.Net.Analysis.TokenStream TokenStream(string fieldName,System.IO.TextReader reader) { var start = base.TokenStream(fieldName,reader); HtmlStripCharFilter filter = new HtmlStripCharFilter(reader); TokenStream result = new StandardFilter(filter); return new StopFilter(new LowerCaseFilter(result),this.stopSet); }
并实现了HtmlStripCharFilter:@H_403_3@
public class HtmlStripCharFilter : Lucene.Net.Analysis.CharTokenizer { private bool inTag = false; public HtmlStripCharFilter(TextReader input) : base(input) { } protected override bool IsTokenChar(char c) { if (c == '<' && inTag == false) { inTag = true; return false; } if (c == '>' && inTag) { inTag = false; return false; } return !inTag && !Char.IsWhiteSpace(c); } }
它朝着正确的方向前进,但在完成之前还需要做更多的工作.如果有人有更好的解决方案(阅读“TESTED”解决方案),我很乐意听到它.@H_403_3@