Lucene輕量級搜索引擎,真的太強了!!!Solr 和 ES 都是基於它

一、基礎知識 1、Lucene 是什麼 Lucene 是一個本地全文搜索引擎,Solr 和 ElasticSearch 都是基於 Lucene 的封裝 Lucene 適合那種輕量級的全文搜索,我就是伺服器資源不夠,如果上 ES 的話會很占用伺服器資源,所有就選擇了 Lucene 搜索引擎 2、倒排索 ...

一、基礎知識

1、Lucene 是什麼

Lucene 是一個本地全文搜索引擎,Solr 和 ElasticSearch 都是基於 Lucene 的封裝

Lucene 適合那種輕量級的全文搜索,我就是伺服器資源不夠,如果上 ES 的話會很占用伺服器資源,所有就選擇了 Lucene 搜索引擎

2、倒排索引原理

全文搜索的原理是使用了倒排索引,那麼什麼是倒排索引呢?

先通過中文分詞器,將文檔中包含的關鍵字全部提取出來，比如我愛中國，會通過分詞器分成我，愛，中國，然後分別對應‘我愛中國’
然後再將關鍵字與文檔的對應關係保存起來
最後對關鍵字本身做索引排序

3、與傳統資料庫對比

Lucene	DB
資料庫表（table）	索引(index)
行（row）	文檔(document）
列（column）	欄位(field）

4、數據類型

常見的欄位類型

StringField：這是一個不可分詞的字元串欄位類型，適用於精確匹配和排序。
TextField：這是一個可分詞的字元串欄位類型，適用於全文搜索和模糊匹配。
IntField、LongField、FloatField、DoubleField：這些是數值欄位類型，用於存儲整數和浮點數。
DateField：這是一個日期欄位類型，用於存儲日期和時間。
BinaryField：這是一個二進位欄位類型，用於存儲二進位數據，如圖片、文件等。
StoredField：這是一個存儲欄位類型，用於存儲不需要被索引的原始數據，如文檔的內容或其他附加信息。

Lucene 分詞器是將文本內容分解成單獨的辭彙（term）的工具。Lucene 提供了多種分詞器，其中一些常見的包括

StandardAnalyzer：這是 Lucene 預設的分詞器，它使用 UnicodeText 解析器將文本轉換為小寫字母，並且根據空格、標點符號和其他字元來進行分詞。
CJKAnalyzer：這個分詞器專門為中日韓語言設計，它可以正確地處理中文、日文和韓文的分詞。
KeywordAnalyzer：這是一個不分詞的分詞器，它將輸入的文本作為一個整體來處理，常用於處理精確匹配的情況。
SimpleAnalyzer：這是一個非常簡單的分詞器，它僅僅按照非字母字元將文本分割成小寫辭彙。
WhitespaceAnalyzer：這個分詞器根據空格將文本分割成小寫辭彙，不會進行任何其他的處理。

但是對於中文分詞器,我們一般常用第三方分詞器IKAnalyzer,需要引入它的POM文件

二、最佳實踐

1、依賴導入

<lucene.version>8.1.1</lucene.version>
<IKAnalyzer-lucene.version>8.0.0</IKAnalyzer-lucene.version>

<!--============lucene start================-->
<!-- Lucene核心庫 -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>${lucene.version}</version>
</dependency>

<!-- Lucene的查詢解析器 -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>${lucene.version}</version>
</dependency>

<!-- Lucene的預設分詞器庫 -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-analyzers-common</artifactId>
    <version>${lucene.version}</version>
</dependency>

<!-- Lucene的高亮顯示 -->
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>${lucene.version}</version>
</dependency>

<!-- ik分詞器 -->
<dependency>
    <groupId>com.jianggujin</groupId>
    <artifactId>IKAnalyzer-lucene</artifactId>
    <version>${IKAnalyzer-lucene.version}</version>
</dependency>
<!--============lucene end================-->

2、創建索引

先制定索引的基本數據,包括索引名稱和欄位

/**
 * @author: sunhhw
 * @date: 2023/12/25 17:39
 * @description: 定義文章文檔欄位和索引名稱
 */
public interface IArticleIndex {

    /**
     * 索引名稱
     */
    String INDEX_NAME = "article";

    // --------------------- 文檔欄位 ---------------------
    String COLUMN_ID = "id";
    String COLUMN_ARTICLE_NAME = "articleName";
    String COLUMN_COVER = "cover";
    String COLUMN_SUMMARY = "summary";
    String COLUMN_CONTENT = "content";
    String COLUMN_CREATE_TIME = "createTime";
}

創建索引並新增文檔

/**
 * 創建索引並設置數據
 *
 * @param indexName 索引地址
 */
public void addDocument(String indexName, List<Document> documentList) {
    // 配置索引的位置 例如:indexDir = /app/blog/index/article
    String indexDir = luceneProperties.getIndexDir() + File.separator + indexName;
    try {
        File file = new File(indexDir);
        // 若不存在，則創建目錄
        if (!file.exists()) {
            FileUtils.forceMkdir(file);
        }
        // 讀取索引目錄
        Directory directory = FSDirectory.open(Paths.get(indexDir));
        // 中文分析器
        Analyzer analyzer = new IKAnalyzer();
        // 索引寫出工具的配置對象
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);
        // 創建索引
        IndexWriter indexWriter = new IndexWriter(directory, conf);
        long count = indexWriter.addDocuments(documentList);
        log.info("[批量添加索引庫]總數量:{}", documentList.size());
        // 提交記錄
        indexWriter.commit();
        // 關閉close
        indexWriter.close();
    } catch (Exception e) {
        log.error("[創建索引失敗]indexDir:{}", indexDir, e);
        throw new UtilsException("創建索引失敗", e);
    }
}

註意這裡有個坑,就是這個indexWriter.close();必須要關閉, 不然在執行其他操作的時候會有一個write.lock文件鎖控制導致操作失敗
indexWriter.addDocuments(documentList)這是批量添加,單個添加可以使用indexWriter.addDocument()

單元測試

@Test
public void create_index_test() {
    ArticlePO articlePO = new ArticlePO();
    articlePO.setArticleName("git的基本使用" + i);
    articlePO.setContent("這裡是git的基本是用的內容" + i);
    articlePO.setSummary("測試摘要" + i);
    articlePO.setId(String.valueOf(i));
    articlePO.setCreateTime(LocalDateTime.now());
    Document document = buildDocument(articlePO);
    LuceneUtils.X.addDocument(IArticleIndex.INDEX_NAME, document);
}

private Document buildDocument(ArticlePO articlePO) {
    Document document = new Document();
    LocalDateTime createTime = articlePO.getCreateTime();
    String format = LocalDateTimeUtil.format(createTime, DateTimeFormatter.ISO_LOCAL_DATE);

    // 因為ID不需要分詞,使用StringField欄位
    document.add(new StringField(IArticleIndex.COLUMN_ID, articlePO.getId() == null ? "" : articlePO.getId(), Field.Store.YES));
    // 文章標題articleName需要搜索,所以要分詞保存
    document.add(new TextField(IArticleIndex.COLUMN_ARTICLE_NAME, articlePO.getArticleName() == null ? "" : articlePO.getArticleName(), Field.Store.YES));
    // 文章摘要summary需要搜索,所以要分詞保存
    document.add(new TextField(IArticleIndex.COLUMN_SUMMARY, articlePO.getSummary() == null ? "" : articlePO.getSummary(), Field.Store.YES));
     // 文章內容content需要搜索,所以要分詞保存
    document.add(new TextField(IArticleIndex.COLUMN_CONTENT, articlePO.getContent() == null ? "" : articlePO.getContent(), Field.Store.YES));
    // 文章封面不需要分詞,但是需要被搜索出來展示
    document.add(new StoredField(IArticleIndex.COLUMN_COVER, articlePO.getCover() == null ? "" : articlePO.getCover()));
    // 創建時間不需要分詞,僅需要展示
    document.add(new StringField(IArticleIndex.COLUMN_CREATE_TIME, format, Field.Store.YES));
    return document;
}

3、更新文檔

更新索引方法

/**
 * 更新文檔
 *
 * @param indexName 索引地址
 * @param document  文檔
 * @param condition 更新條件
 */
public void updateDocument(String indexName, Document document, Term condition) {
    String indexDir = luceneProperties.getIndexDir() + File.separator + indexName;
    try {
        // 讀取索引目錄
        Directory directory = FSDirectory.open(Paths.get(indexDir));
        // 中文分析器
        Analyzer analyzer = new IKAnalyzer();
        // 索引寫出工具的配置對象
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);
        // 創建索引
        IndexWriter indexWriter = new IndexWriter(directory, conf);
        indexWriter.updateDocument(condition, document);
        indexWriter.commit();
        indexWriter.close();
    } catch (Exception e) {
        log.error("[更新文檔失敗]indexDir:{},document:{},condition:{}", indexDir, document, condition, e);
        throw new ServiceException();
    }
}

單元測試

@Test
public void update_document_test() {
    ArticlePO articlePO = new ArticlePO();
    articlePO.setArticleName("git的基本使用=編輯");
    articlePO.setContent("這裡是git的基本是用的內容=編輯");
    articlePO.setSummary("測試摘要=編輯");
    articlePO.setId("2");
    articlePO.setCreateTime(LocalDateTime.now());
    Document document = buildDocument(articlePO);
    LuceneUtils.X.updateDocument(IArticleIndex.INDEX_NAME, document, new Term("id", "2"));
}

更新的時候,如果存在就更新那條記錄,如果不存在就會新增一條記錄
new Term("id", "2")搜索條件,跟資料庫里的where id = 2差不多
IArticleIndex.INDEX_NAME = article 索引名稱

4、刪除文檔

刪除文檔方法

/**
* 刪除文檔
*
* @param indexName 索引名稱
* @param condition 更新條件
*/
public void deleteDocument(String indexName, Term condition) {
  String indexDir = luceneProperties.getIndexDir() + File.separator + indexName;
  try {
      // 讀取索引目錄
      Directory directory = FSDirectory.open(Paths.get(indexDir));
      // 索引寫出工具的配置對象
      IndexWriterConfig conf = new IndexWriterConfig();
      // 創建索引
      IndexWriter indexWriter = new IndexWriter(directory, conf);

      indexWriter.deleteDocuments(condition);
      indexWriter.commit();
      indexWriter.close();
  } catch (Exception e) {
      log.error("[刪除文檔失敗]indexDir:{},condition:{}", indexDir, condition, e);
      throw new ServiceException();
  }
}

單元測試

@Test
public void delete_document_test() {
    LuceneUtils.X.deleteDocument(IArticleIndex.INDEX_NAME, new Term(IArticleIndex.COLUMN_ID, "1"));
}

刪除文檔跟編輯文檔類似

5、刪除索引

把改索引下的數據全部清空

/**
* 刪除索引
*
* @param indexName 索引地址
*/
public void deleteIndex(String indexName) {
  String indexDir = luceneProperties.getIndexDir() + File.separator + indexName;
  try {
      // 讀取索引目錄
      Directory directory = FSDirectory.open(Paths.get(indexDir));
      // 索引寫出工具的配置對象
      IndexWriterConfig conf = new IndexWriterConfig();
      // 創建索引
      IndexWriter indexWriter = new IndexWriter(directory, conf);
      indexWriter.deleteAll();
      indexWriter.commit();
      indexWriter.close();
  } catch (Exception e) {
      log.error("[刪除索引失敗]indexDir:{}", indexDir, e);
      throw new ServiceException();
  }
}

6、普通查詢

TermQuery查詢