lucene介绍和使用

一、Lucene 核心介绍

1. 什么是 Lucene

Apache Lucene（通常简称 Lucene）是一款开源的、高性能的、基于 Java 编写的全文检索引擎工具包（注意：它不是一个完整的全文检索引擎应用，而是构建检索引擎的核心底层库）。它提供了完整的索引创建、索引存储和高效检索能力，是众多知名检索框架 / 应用的底层核心（如 Elasticsearch、Solr 均基于 Lucene 构建）。

2. 核心优势

高性能：采用倒排索引（核心数据结构）、分词优化、缓存机制，支持海量数据的快速检索；
可扩展：支持自定义分词器、自定义排序规则、自定义存储格式，适配不同业务场景；
功能完善：支持布尔检索、模糊检索、短语检索、高亮显示等多种检索特性；
跨平台：基于 Java 实现，可在所有支持 Java 的平台上运行，同时有多种语言的移植版本（如 Lucene.NET）。

官网

https://www.apache.org/dyn/closer.lua/lucene/java/10.3.2/lucene-10.3.2.tgz

做什么的？

Lucene 本身是一个库，需要开发者基于它进行二次开发，常见应用场景包括：
企业搜索：如企业内部文档管理系统、知识库的全文搜索。
电商平台：商品标题、描述的搜索，配合过滤条件（价格、分类）。
日志分析：对大量日志文本进行关键词检索、过滤。
内容管理系统（CMS）：文章、新闻的全文搜索功能。

java 使用

 <!-- 1. 核心索引/查询功能 -->
      <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>8.11.2</version>
        </dependency>
        <!-- 2. 查询解析器（支持复杂查询语法） -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>8.11.2</version>
        </dependency>
        <!-- 3. 中文分词器（解决中文拆分问题，默认分词器不支持中文） -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>8.11.2</version>
        </dependency>


import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.nio.file.Paths;

/**
 * Lucene 基础 Demo（索引构建 + 检索查询）
 */
public class LuceneBasicDemo {

    // 索引存储路径
    private static final String INDEX_DIR = "./lucene_index";
    // 分词器（采用标准分词器）
    private static final Analyzer ANALYZER = new StandardAnalyzer();

    /**
     * 步骤1：构建索引（将模拟数据写入索引库）
     */
    public static void createIndex() throws Exception {
        // 1. 创建目录对象（磁盘存储目录）
        Directory directory = FSDirectory.open(Paths.get(INDEX_DIR));

        // 2. 配置 IndexWriter（索引写入器配置）
        IndexWriterConfig config = new IndexWriterConfig(ANALYZER);
        // 创建模式：CREATE 表示清空原有索引重新创建，APPEND 表示追加索引
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);

        // 3. 创建 IndexWriter（索引写入器）
        try (IndexWriter indexWriter = new IndexWriter(directory, config)) {
            // 4. 构建模拟数据（3条文档，对应3条待检索数据）
            Document doc1 = new Document();
            // 添加字段：StringField（精确匹配，不分词，如ID、编号）
            doc1.add(new StringField("id", "1001", Field.Store.YES));
            // 添加字段：TextField（全文检索，分词，存储原文，如内容、标题）
            doc1.add(new TextField("title", "Apache Lucene 入门教程", Field.Store.YES));
            doc1.add(new TextField("content", "Lucene 是一款高性能的开源全文检索引擎工具包，基于Java编写。", Field.Store.YES));
            indexWriter.addDocument(doc1);

            Document doc2 = new Document();
            doc2.add(new StringField("id", "1002", Field.Store.YES));
            doc2.add(new TextField("title", "Elasticsearch 核心原理", Field.Store.YES));
            doc2.add(new TextField("content", "Elasticsearch 基于 Lucene 构建，提供了分布式检索和存储能力。", Field.Store.YES));
            indexWriter.addDocument(doc2);

            Document doc3 = new Document();
            doc3.add(new StringField("id", "1003", Field.Store.YES));
            doc3.add(new TextField("title", "Lucene 倒排索引详解", Field.Store.YES));
            doc3.add(new TextField("content", "倒排索引是 Lucene 高性能检索的核心，通过词项映射文档ID实现快速匹配。", Field.Store.YES));
            indexWriter.addDocument(doc3);

            // 5. 提交索引（try-with-resources 会自动关闭 indexWriter，无需手动提交，关闭时会自动提交）
            System.out.println("索引构建完成！共写入 " + indexWriter.numRamDocs() + " 条文档。");
        }
    }

    /**
     * 步骤2：检索索引（根据查询条件获取结果）
     * @param queryStr 查询语句
     * @param field 检索的字段（如 title、content）
     */
    public static void searchIndex(String queryStr, String field) throws Exception {
        // 1. 创建目录对象（读取已构建的索引）
        Directory directory = FSDirectory.open(Paths.get(INDEX_DIR));

        // 2. 创建 IndexReader（索引读取器）
        try (IndexReader indexReader = DirectoryReader.open(directory)) {
            // 3. 创建 IndexSearcher（索引检索器）
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            // 4. 构建 Query（查询对象）：通过 QueryParser 解析查询语句
            QueryParser queryParser = new QueryParser(field, ANALYZER);
            // 解析查询语句（支持模糊查询、布尔查询等，如 "lucene~" 模糊查询）
            Query query = queryParser.parse(queryStr);

            // 5. 执行检索：获取前10条匹配结果（TopDocs 封装了检索结果）
            TopDocs topDocs = indexSearcher.search(query, 10);
            ScoreDoc[] scoreDocs = topDocs.scoreDocs;

            System.out.println("\n检索结果：共匹配到 " + topDocs.totalHits.value + " 条文档");

            // 6. 遍历并打印结果
            for (ScoreDoc scoreDoc : scoreDocs) {
                // 根据 docId 获取对应的 Document
                Document document = indexSearcher.doc(scoreDoc.doc);
                System.out.println("------------------------");
                System.out.println("文档ID：" + scoreDoc.doc);
                System.out.println("唯一ID：" + document.get("id"));
                System.out.println("标题：" + document.get("title"));
                System.out.println("内容：" + document.get("content"));
                System.out.println("匹配得分：" + scoreDoc.score); // 相关性得分（越高越匹配）
            }
        }
    }

    // 主方法：执行索引构建与检索
    public static void main(String[] args) {
        try {
            // 第一步：构建索引
            createIndex();

            // 第二步：执行检索
            // 示例1：在 title 字段中检索包含 "Lucene" 的文档
            searchIndex("Lucene", "title");

            // 示例2：在 content 字段中检索包含 "基于 Lucene" 的文档
            // searchIndex("基于 Lucene", "content");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

运行截图

image.png

和ai 说的基本一致了

image.png

lucene介绍和使用

lucene介绍和使用

一、Lucene 核心介绍

1. 什么是 Lucene

2. 核心优势

官网

做什么的？

java 使用

相关阅读更多精彩内容

友情链接更多精彩内容