java – 从Lucene中找到搜索命中的位置

前端之家收集整理的这篇文章主要介绍了java – 从Lucene中找到搜索命中的位置前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
使用Lucene,在搜索结果中找到匹配的推荐方法是什么?

更具体地说,假设索引文档具有字段“fullText”,其存储某些文档的纯文本内容.此外,假设对于这些文件中的一个,内容是“快速的棕色狐狸跳过懒狗”.接下来,搜索“狐狸狗”.显然,这份文件很受欢迎.

在这种情况下,Lucene可以用来提供类似于找到文档的匹配区域吗?所以对于这种情况,我想生产类似的东西:

[{match: "fox",startIndex: 10,length: 3},{match: "dog",startIndex: 34,length: 3}]

我怀疑它可以通过org.apache.lucene.search.highlight包中提供的内容来实现.我不确定整体方法……

解决方法

我使用的是TermFreqVector.这是一个工作演示,它打印术语位置,以及起始和结束术语索引:
public class Search {
    public static void main(String[] args) throws IOException,ParseException {
        Search s = new Search();  
        s.doSearch(args[0],args[1]);  
    }  

    Search() {
    }  

    public void doSearch(String db,String querystr) throws IOException,ParseException {
        // 1. Specify the analyzer for tokenizing text.  
        //    The same analyzer should be used as was used for indexing  
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);  

        Directory index = FSDirectory.open(new File(db));  

        // 2. query  
        Query q = new QueryParser(Version.LUCENE_CURRENT,"contents",analyzer).parse(querystr);  

        // 3. search  
        int hitsPerPage = 10;  
        IndexSearcher searcher = new IndexSearcher(index,true);  
        IndexReader reader = IndexReader.open(index,true);  
        searcher.setDefaultFieldSortScoring(true,false);  
        TopscoreDocCollector collector = TopscoreDocCollector.create(hitsPerPage,true);  
        searcher.search(q,collector);  
        scoreDoc[] hits = collector.topDocs().scoreDocs;  

        // 4. display term positions,and term indexes   
        System.out.println("Found " + hits.length + " hits.");  
        for(int i=0;i<hits.length;++i) {  

            int docId = hits[i].doc;  
            TermFreqVector tfvector = reader.getTermFreqVector(docId,"contents");  
            TermPositionVector tpvector = (TermPositionVector)tfvector;  
            // this part works only if there is one term in the query string,// otherwise you will have to iterate this section over the query terms.  
            int termidx = tfvector.indexOf(querystr);  
            int[] termposx = tpvector.getTermPositions(termidx);  
            TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);  

            for (int j=0;j<termposx.length;j++) {  
                System.out.println("termpos : "+termposx[j]);  
            }  
            for (int j=0;j<tvoffsetinfo.length;j++) {  
                int offsetStart = tvoffsetinfo[j].getStartOffset();  
                int offsetEnd = tvoffsetinfo[j].getEndOffset();  
                System.out.println("offsets : "+offsetStart+" "+offsetEnd);  
            }  

            // print some info about where the hit was found...  
            Document d = searcher.doc(docId);  
            System.out.println((i + 1) + ". " + d.get("path"));  
        }  

        // searcher can only be closed when there  
        // is no need to access the documents any more.   
        searcher.close();  
    }      
}
原文链接:https://www.f2er.com/java/121356.html

猜你在找的Java相关文章