Lucene的学习及使用实验
阅读原文时间:2023年07月09日阅读:3

实验一下Lucene是怎么使用的。

参考:http://www.importnew.com/12715.html (例子比较简单)

http://www.yiibai.com/lucene/lucene_first_application.html (例子比较复杂)

这里也有一个例子:http://www.tuicool.com/articles/aqIZNnE

我用的版本比较高,是6.2.1版本,文档查阅:

http://lucene.apache.org/core/6_2_1/core/index.html

首先在Intellij里面创建一个Maven项目。名字为lucene-demo。(主要参考 http://www.importnew.com/12715.html )

其中pom.xml如下:


http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0

<groupId>com.myapp</groupId>  
<artifactId>lucene-demo</artifactId>  
<version>1.0-SNAPSHOT</version>

<dependencies>  
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->  
    <dependency>  
        <groupId>org.apache.lucene</groupId>  
        <artifactId>lucene-core</artifactId>  
        <version>6.2.1</version>  
    </dependency>  
    <dependency>  
        <groupId>org.apache.lucene</groupId>  
        <artifactId>lucene-queryparser</artifactId>  
        <version>6.2.1</version>  
    </dependency>  
</dependencies>

讲了一个package:com.myapp.lucene,里面class LuceneDemo,内容如下:

package com.myapp.lucene;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.store.Directory;

import java.io.IOException;

/**
* Created by baidu on 16/10/20.
*/
public class LuceneDemo {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
static StandardAnalyzer analyzer;
static Directory index;

static void prepareDoc() throws IOException{  
    // 0. init analyzer  
    analyzer = new StandardAnalyzer();

    // 1. create index  
    index = new RAMDirectory();  
    IndexWriterConfig config = new IndexWriterConfig(analyzer);

    IndexWriter w = new IndexWriter(index, config);

    addDoc(w, "lucence tutorial", "123456");  
    addDoc(w, "hi hi hi", "222");  
    addDoc(w, "ok LUCENCE", "123");  
    w.close();  
}

static void addDoc(IndexWriter w, String text, String more) throws IOException{  
    Document doc = new Document();  
    doc.add(new TextField("text", text, Field.Store.YES));  
    doc.add(new StringField("more", more, Field.Store.YES));  
    w.addDocument(doc);  
}

static void search(String str) throws ParseException, IOException {  
    // 2. query  
    Query q = new QueryParser("text", analyzer).parse(str);

    // 3. search  
    int listNum = 10;  
    IndexReader reader = DirectoryReader.open(index);  
    IndexSearcher searcher = new IndexSearcher(reader);  
    TopScoreDocCollector collector = TopScoreDocCollector.create(listNum);  
    searcher.search(q, collector);  
    ScoreDoc\[\] hits = collector.topDocs().scoreDocs;

    // 4. display  
    System.out.printf("Found %d docs.\\n", hits.length);  
    for (int i=0; i<hits.length; i++) {  
        int docId = hits\[i\].doc;  
        Document doc = searcher.doc(docId);  
        System.out.printf("Doc %d: text: %s, more: %s\\n", i+1, doc.get("text"), doc.get("more"));  
    }  
    reader.close();

}

public static void main(String\[\] args) {  
    try {  
        prepareDoc();  
        search("Lucence");  
    } catch (IOException e) {  
        e.printStackTrace();  
    } catch (ParseException e) {  
        e.printStackTrace();  
    }

}  

}

然后运行,能够成功:

Found 2 docs.
Doc 1: text: lucence tutorial, more: 123456
Doc 2: text: ok LUCENCE, more: 123

Process finished with exit code 0

因为用的是RAMDirectory,所以应该没有创建实际的目录和文件。

另外,代码和逻辑中有几点需要注意的地方:

注意,对于需要分词的内容我们使用TextField,对于像id这样不需要分词的内容我们使用StringField。

编码过程中,报过好几次错,关于Exception需要wrap或者throws的情况。

有些API的版本升级了,参数和以前不一样。在实际的代码中根据实际要求有所修改。一般都是简化了。

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器

你可能感兴趣的文章