Elastic是目前全文搜索引擎的首選,本質上是一個非關係型資料庫,許多知名公司都是使用它來做全文搜索,比如github,本文介紹了一些它的基本的操作,後面還會學習一些更高級的知識,比如中文分詞,與項目結合使用等等 ...
ubuntu16.04+elasticsearch6.5為例,參考官網文檔https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html
安裝java
參考文章:https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
$ sudo apt-get update
$ sudo apt-get install -y default-jre
$ sudo add-apt-repository ppa:webupd8team/java && sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
$ java -version #測試java
$ echo $JAVA_HOME #測試java_home
Elasticsearch
安裝(6.5.4)
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.zip
$ unzip elasticsearch-6.5.4.zip
啟動
$ cd elasticsearch-6.5.4/bin
$ ./elasticsearch
啟動時,如果報錯vm.maxmapcount [65530] is too low執行下麵
$ sudo sysctl -w vm.max_map_count=262144
curl測試,出現以下信息表示啟動成功,安裝正常
$ curl 127.0.0.1:9200
{
"name" : "c5skAub",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "bdkUuVtQSvWOiY_vXEFnvw",
"version" : {
"number" : "6.5.4",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "d2ef93d",
"build_date" : "2018-12-17T21:17:40.758843Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
基礎概念
Elastic是目前全文搜索引擎的首選,本質上是非關係型資料庫,與mysql一些概念對比如下。
Mysql | Elastic |
---|---|
database(資料庫) | index(索引) |
table(表) | type(類型,7.x將廢棄) |
row(記錄) | document(文檔) |
column(欄位) | fileds(欄位) |
基本操作
Elastic的操作通過rest api來完成,以下操作都將省去
curl -XMETHOD "http://localhost:9200" -H 'Content-Type: application/json' [-d 'request body']
,如果想遠程訪問,修改/path-to-elastic/config/elasticsearch.yml
中的network.host: 0.0.0.0
後重啟即可
操作索引
新建一個名為customer的index,?pretty返回友好的json
$ PUT /customer?pretty
列出所有索引
$ GET /_cat/indices?v
刪除索引
$ DELETE /customer
操作文檔
新建id為1的document,由於type將被廢除,所以規定每個index只包含一個type,統一為_doc
$ PUT /customer/_doc/1?pretty
{
"name": "luke"
}
如果使用post並且id留空將會生成一個隨機的id
$ POST /customer/_doc?pretty {"name": "php"}
{
"_index": "customer",
"_type": "_doc",
"_id": "hIkkLGgBFVhvdLuiNNGD", ##返回的id
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 3
}
更新文檔與新建相同,改變數據即可,或者
$ POST /customer/_doc/1/_update?pretty
{
"doc": { "name": "luke44", "age": 24 }
}
使用簡單的腳本更新,這裡的ctx._source指向將被修改的文檔
$ POST /customer/_doc/1/_update?pretty
{
"script" : "ctx._source.age += 5"
}
查詢id為1的文檔
$ GET /customer/_doc/1?pretty
{
"_index": "customer",
"_type": "_doc",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "luke"
}
}
刪除文檔
$ DELETE /customer/_doc/2?pretty
批量操作,批量更新id為1和2的文檔,註意在postman中body最後必須空一行
$ POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "luke" }
{"index":{"_id":"2"}}
{"name": "php", "age": "20" }
先更新id為1的文檔,然後刪除id為2的文檔
$ POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc":{"name":"php best"}}
{"delete":{"_id":"2"}}
批量操作時其中一個操作失敗時,其他操作任然會繼續執行,結束時根據執行順序返回狀態。
瀏覽數據
先準備一個虛擬的銀行客戶帳戶信息數據集,類似這種格式,請右鍵下載數據集另存為accounts.json
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "[email protected]",
"city": "Hobucken",
"state": "CO"
}
導入數據集
$ POST /bank/_doc/_bulk?pretty&refresh --data-binary "@accounts.json"
$ GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 3inMmuQzRqaTpMkzfh07_A 5 1 1000 0 95.9kb 95.9kb
yellow open customer gSRgPG9cScKHcuycJE2drw 5 1 2 0 7.7kb 7.7kb
match_all查詢
使用URI搜索,q=*
匹配所有,sort=account_number:asc
表示按account_number
升序排列
$ GET /bank/_search?q=*&sort=account_number:asc&pretty
{
"took" : 63, //耗時,毫秒
"timed_out" : false, //是否超時
"_shards" : { //碎片
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : { //命中
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"[email protected]","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
}, ...
]
}
}
使用json請求體搜索,獲取跟上面相同的效果
$ GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
使用size和from限制結果條數,類似mysql的limit和from;使用_source查詢指定欄位
$ GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } },
"from": 10,
"size": 15, //預設10
"_source": ["account_number", "balance"]
}
match查詢
查詢account_number為20的所有賬戶
$ GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}
查詢address中包含mill
單詞的所有賬戶
$ GET /bank/_search
{
"query": { "match": { "address": "mill" } }
}
查詢address中包含mill
或者lane
單詞的所有賬戶
$ GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
match_phrase查詢,match的變種,查詢address中包含mill lane
的所有賬戶
$ GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
bool查詢
查詢address中包含mill
和lane
單詞的所有賬戶,bool must
子句指定所有必須為true的查詢才能將文檔視為匹配項
$ GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
//"should": [...] 或查詢
//"must_not": [...] 都不是
}
}
}
組合查詢,查詢年齡為40並且不住在ID
省的客戶賬戶
$ GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
bool過濾器
查詢餘額在20000到30000(包含)的客戶賬戶
$ GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}