Python工具箱系列(三十二)

-Advertisement-

Elasticsearch是一個基於Lucene的搜索引擎。它提供了一個分散式多用戶能力的全文搜索引擎，基於RESTful 的API介面。Elasticsearch是用Java語言開發的，並作為Apache許可條款下的開放源碼發佈，是非常流行的企業級搜索引擎。 ...

Elasticsearch

Elasticsearch是一個基於Lucene的搜索引擎。它提供了一個分散式多用戶能力的全文搜索引擎，基於RESTful 的API介面。Elasticsearch是用Java語言開發的，並作為Apache許可條款下的開放源碼發佈，是非常流行的企業級搜索引擎。官方支持的客戶端語言包括Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby等。根據DB-Engines的排名顯示，Elasticsearch是最受歡迎的企業搜索引擎，其次是Apache Solr，而Solr也是基於Lucene開發的。

Elasticsearch的安裝方式有許多，官方也特別希望能夠在公有雲上部署。本文選擇最簡單的方式，直接在自己掌握的主機(ip:172.29.30.155)上安裝。其安裝過程如下所述：

# 這個安裝過程也有可能非常慢。
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
sudo apt-get install apt-transport-https
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update && sudo apt-get install -y elasticsearch

另一個簡單的辦法就是直接下載安裝包。從官網上下載：

# 在ubuntu bionic目標機的終端下
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.3.2-amd64.deb
sudo dpkg -i elasticsearch-8.3.2-amd64.deb

這種方式的好處是可以複製deb文件以多個電腦上，從而節省下載時間。需要安裝的目標電腦越多，這種方式越合算。

在ubuntu bionic下，可以使用systemd對其進行管理。相關命令如下：

sudo /bin/systemctl daemon-reload
# 自動啟動
sudo /bin/systemctl enable elasticsearch
# 啟動
sudo systemctl start elasticsearch
# 查看狀態
sudo systemctl status elasticsearch
# 如果出現錯誤，可以查看日誌。
journalctl -f
journalctl -u elasticsearch

# 停止
sudo systemctl stop elasticsearch

# 重置口令,人工指定
/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic -i

# 重置口令,自動生成
/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

# 測試之
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://localhost:9200
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://172.29.30.155:9200

獲得的響應類似下列信息：

{
  "name" : "dbservers",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "LFs6cpSHTSqLqbx6lRgkvw",
  "version" : {
    "number" : "8.3.2",
    "build_type" : "deb",
    "build_hash" : "8b0b1f23fbebecc3c88e4464319dea8989f374fd",
    "build_date" : "2022-07-06T15:15:15.901688194Z",
    "build_snapshot" : false,
    "lucene_version" : "9.2.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Elasticsearch的功能非常複雜，需要下功夫學習，本文只從python的角度來使用這個工具。官方推薦的模塊安裝如下：

pip install elasticsearch

# 為了能夠完成安全驗證，需要下載相關的證書到本地
scp [email protected]:/etc/elasticsearch/certs/http_ca.crt .

完成後，以下代碼簡單示例瞭如何插入記錄：

from elasticsearch import Elasticsearch
from datetime import datetime
serverip = "172.29.30.155"
cafile = r"d:\http_ca.crt"
ELASTIC_PASSWORD = "88488848"
indexname = "poetry"
index = 0


def connect():
    client = Elasticsearch(
        f"https://{serverip}:9200", ca_certs=cafile, basic_auth=("elastic", ELASTIC_PASSWORD))
    return client


def docgen(author, content):
    doc = {'author': author, 'text': content, 'timestamp': datetime.now(), }
    return doc


def insert(con, id, doc):
    resp = con.index(index=indexname, id=id, document=doc)
    return resp['result']


def getbyindex(con, id):
    resp = con.get(index=indexname, id=id)
    return resp['_source']


def list(con):
    resp = con.search(index=indexname, query={"match_all": {}})
    print("Got %d Hits:" % resp['hits']['total']['value'])
    for hit in resp['hits']['hits']:
        print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])


def search(con, str):
    resp = con.search(index=indexname, query={"match": {"text": str}})
    print("Got %d Hits:" % resp['hits']['total']['value'])
    for hit in resp['hits']['hits']:
        print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])


# 連接
con = connect()

# 插入記錄
index += 1
doc = docgen("李白", "天生我才必有用")
print(insert(con, index, doc))

index += 1
doc = docgen("杜甫", "功蓋三分國，名成八陣圖，江流石不轉，遺恨失吞吳")
print(insert(con, index, doc))

# 準確獲得記錄
print(getbyindex(con, 1))

# 列出所有記錄
list(con)

# 使用搜索功能，找到相關記錄
search(con, "天生")

上述代碼只是簡單地插入了2條記錄。真正要發揮作用搜索引擎的能力，必須要將大量的信息導入，同時也要建設集群系統，這部分的內容請閱讀官網相關資料，本文不再重覆。

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

消息推送平臺有沒有保證數據不丟？

我們在使用mq的時候，就會很自然思考一個問題：怎麼保證數據不丟失？現在austin接入層是把消息發到mq，下發邏輯層從mq消費數據，隨後調用對應渠道介面來下發消息。消息推送平臺🔥推送下發【郵件】【簡訊】【微信服務號】【微信小程式】【企業微信】【釘釘】等消息類型。 https://gitee.c ...
Intellij Idea教程_編程入門自學教程_菜鳥教程-免費教程分享

教程簡介 IDEA 全稱 IntelliJ IDEA，是java編程語言的集成開發環境。IntelliJ在業界被公認為最好的Java開發工具，尤其在智能代碼助手、代碼自動提示、重構、JavaEE支持、各類版本工具(git、svn等)、JUnit、CVS整合、代碼分析、創新的GUI設計等方面的功能可 ...
【設計模式】使用 go 語言實現簡單工廠模式

元語言抽象就是建立新的語言。它在工程設計的所有分支中都扮演著重要的角色，在電腦程式設計領域更是特別重要。因為這個領域中，我們不僅可以設計新的語言，還可以通過構造求值器的方式實現這些語言。對某個程式設計語言的求值器（或者解釋器）也是一個過程，在應用於這個語言的一個表達式時，它能夠執行求值這個表達式所... ...
【深入淺出 Yarn 架構與實現】6-4 Container 生命周期源碼分析

本文將深入探討 AM 向 RM 申請並獲得 Container 資源後，在 NM 節點上如何啟動和清理 Container。將詳細分析整個過程的源碼實現。 ...
Windows平臺下的Go版本切換工具-g

voidint/g g 是一個 Linux、macOS、Windows 下的命令行工具，可以提供一個便捷的多版本 go 環境的管理和切換。在這裡我們介紹一下在 windows 下的使用，涉及到我們開發所需要用到的幾個 go 項目層環境變數它們分別是 GOPATH，GOPROXY，GO111MOD ...
Spring源碼：Bean生命周期（四）

在本文中，我們深入探討了 Spring 框架中 Bean 的實例化過程，關於某些細節以後我會單獨拿出一篇文章單獨講解，我們來總結下實例化都做了哪些事情：先從bean定義中載入當前類，因為最初Spring使用ASM技術解析元數據時只獲取了當前類的名稱尋找所有InstantiationAwareBean... ...
< Python全景系列-2 > Python數據類型大盤點

Python作為一門強大且靈活的編程語言，擁有豐富的數據類型系統。本文詳細介紹了Python中的每一種數據類型，包括數值、序列、映射、集合、布爾和None類型。每種數據類型的特性、使用方式，以及在實際問題中的應用都將被深入探討。此外，我們還將探討Python的動態類型特性，以及如何在實際編程中充分利... ...
Pytest - 概述&入門

Pytest - 概述&入門簡介 Pytest是一款強大的python自動化測試工具，可以勝任各種類型或者級別的軟體測試工作； pytest提供了豐富的功能，包括assert重寫，第三方插件，以及其他測試工具無法比擬的fixture模型； pytest是一個軟體測試框架，是一款命令行工具，可以自動 ...