使用Lucene,在搜索结果中找到匹配的推荐方法是什么?
更具体地说,假设索引文档具有字段“fullText”,其存储某些文档的纯文本内容.此外,假设对于这些文件中的一个,内容是“快速的棕色狐狸跳过懒狗”.接下来,搜索“狐狸狗”.显然,这份文件很受欢迎.
在这种情况下,Lucene可以用来提供类似于找到文档的匹配区域吗?所以对于这种情况,我想生产类似的东西:
[{match: "fox",startIndex: 10,length: 3},{match: "dog",startIndex: 34,length: 3}]
我怀疑它可以通过org.apache.lucene.search.highlight包中提供的内容来实现.我不确定整体方法……
解决方法
我使用的是TermFreqVector.这是一个工作演示,它打印术语位置,以及起始和结束术语索引:
public class Search { public static void main(String[] args) throws IOException,ParseException { Search s = new Search(); s.doSearch(args[0],args[1]); } Search() { } public void doSearch(String db,String querystr) throws IOException,ParseException { // 1. Specify the analyzer for tokenizing text. // The same analyzer should be used as was used for indexing StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); Directory index = FSDirectory.open(new File(db)); // 2. query Query q = new QueryParser(Version.LUCENE_CURRENT,"contents",analyzer).parse(querystr); // 3. search int hitsPerPage = 10; IndexSearcher searcher = new IndexSearcher(index,true); IndexReader reader = IndexReader.open(index,true); searcher.setDefaultFieldSortScoring(true,false); TopscoreDocCollector collector = TopscoreDocCollector.create(hitsPerPage,true); searcher.search(q,collector); scoreDoc[] hits = collector.topDocs().scoreDocs; // 4. display term positions,and term indexes System.out.println("Found " + hits.length + " hits."); for(int i=0;i<hits.length;++i) { int docId = hits[i].doc; TermFreqVector tfvector = reader.getTermFreqVector(docId,"contents"); TermPositionVector tpvector = (TermPositionVector)tfvector; // this part works only if there is one term in the query string,// otherwise you will have to iterate this section over the query terms. int termidx = tfvector.indexOf(querystr); int[] termposx = tpvector.getTermPositions(termidx); TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;j<termposx.length;j++) { System.out.println("termpos : "+termposx[j]); } for (int j=0;j<tvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); System.out.println("offsets : "+offsetStart+" "+offsetEnd); } // print some info about where the hit was found... Document d = searcher.doc(docId); System.out.println((i + 1) + ". " + d.get("path")); } // searcher can only be closed when there // is no need to access the documents any more. searcher.close(); } }