Elasticsearch入門教程之安裝與基本使用

Elastic是目前全文搜索引擎的首選，本質上是一個非關係型資料庫，許多知名公司都是使用它來做全文搜索，比如github，本文介紹了一些它的基本的操作，後面還會學習一些更高級的知識，比如中文分詞，與項目結合使用等等 ...

ubuntu16.04+elasticsearch6.5為例，參考官網文檔https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

安裝java

參考文章：https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

$ sudo apt-get update
$ sudo apt-get install -y default-jre
$ sudo add-apt-repository ppa:webupd8team/java && sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
$ java -version     #測試java
$ echo $JAVA_HOME   #測試java_home

Elasticsearch

安裝(6.5.4)

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.zip
$ unzip elasticsearch-6.5.4.zip

啟動

$ cd elasticsearch-6.5.4/bin
$ ./elasticsearch

啟動時，如果報錯vm.maxmapcount [65530] is too low執行下麵

$ sudo sysctl -w vm.max_map_count=262144

curl測試，出現以下信息表示啟動成功，安裝正常

$ curl 127.0.0.1:9200   
{
  "name" : "c5skAub",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "bdkUuVtQSvWOiY_vXEFnvw",
  "version" : {
    "number" : "6.5.4",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "d2ef93d",
    "build_date" : "2018-12-17T21:17:40.758843Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

基礎概念

Elastic是目前全文搜索引擎的首選，本質上是非關係型資料庫，與mysql一些概念對比如下。

Mysql	Elastic
database(資料庫)	index(索引)
table(表)	type(類型，7.x將廢棄)
row(記錄)	document(文檔)
column(欄位)	fileds(欄位)

基本操作

Elastic的操作通過rest api來完成，以下操作都將省去curl -XMETHOD "http://localhost:9200" -H 'Content-Type: application/json' [-d 'request body']，如果想遠程訪問，修改/path-to-elastic/config/elasticsearch.yml中的network.host: 0.0.0.0後重啟即可

操作索引

新建一個名為customer的index，?pretty返回友好的json

$ PUT /customer?pretty

列出所有索引

$ GET /_cat/indices?v

刪除索引

$ DELETE /customer

操作文檔

新建id為1的document，由於type將被廢除，所以規定每個index只包含一個type，統一為_doc

$ PUT /customer/_doc/1?pretty
{
    "name": "luke"
}

如果使用post並且id留空將會生成一個隨機的id

$ POST /customer/_doc?pretty {"name": "php"}
{
    "_index": "customer",
    "_type": "_doc",
    "_id": "hIkkLGgBFVhvdLuiNNGD",  ##返回的id
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 3
}

更新文檔與新建相同，改變數據即可，或者

$ POST /customer/_doc/1/_update?pretty
{
    "doc": { "name": "luke44", "age": 24 }
}

使用簡單的腳本更新，這裡的ctx._source指向將被修改的文檔

$ POST /customer/_doc/1/_update?pretty
{
  "script" : "ctx._source.age += 5"
}

查詢id為1的文檔

$ GET /customer/_doc/1?pretty
{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "name": "luke"
    }
}

刪除文檔

$ DELETE /customer/_doc/2?pretty

批量操作，批量更新id為1和2的文檔，註意在postman中body最後必須空一行

$ POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "luke" }
{"index":{"_id":"2"}}
{"name": "php", "age": "20" }

先更新id為1的文檔，然後刪除id為2的文檔

$ POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc":{"name":"php best"}}
{"delete":{"_id":"2"}}

批量操作時其中一個操作失敗時，其他操作任然會繼續執行，結束時根據執行順序返回狀態。

瀏覽數據

先準備一個虛擬的銀行客戶帳戶信息數據集，類似這種格式，請右鍵下載數據集另存為accounts.json

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "[email protected]",
    "city": "Hobucken",
    "state": "CO"
}

導入數據集

$ POST /bank/_doc/_bulk?pretty&refresh --data-binary "@accounts.json"
$ GET /_cat/indices?v
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank     3inMmuQzRqaTpMkzfh07_A   5   1       1000            0     95.9kb         95.9kb
yellow open   customer gSRgPG9cScKHcuycJE2drw   5   1          2            0      7.7kb          7.7kb

match_all查詢

使用URI搜索，q=*匹配所有，sort=account_number:asc表示按account_number升序排列

$ GET /bank/_search?q=*&sort=account_number:asc&pretty
{
  "took" : 63,  //耗時，毫秒
  "timed_out" : false,  //是否超時
  "_shards" : {     //碎片
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {    //命中
    "total" : 1000,
    "max_score" : null,
    "hits" : [ {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],
      "_score" : null,
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"[email protected]","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

使用json請求體搜索，獲取跟上面相同的效果

$ GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

使用size和from限制結果條數，類似mysql的limit和from；使用_source查詢指定欄位

$ GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } },
  "from": 10,
  "size": 15,    //預設10
  "_source": ["account_number", "balance"]
}

match查詢

查詢account_number為20的所有賬戶

$ GET /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}

查詢address中包含mill單詞的所有賬戶

$ GET /bank/_search
{
  "query": { "match": { "address": "mill" } }
}

查詢address中包含mill或者lane單詞的所有賬戶

$ GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

match_phrase查詢，match的變種，查詢address中包含mill lane的所有賬戶

$ GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

bool查詢

查詢address中包含mill和lane單詞的所有賬戶，bool must子句指定所有必須為true的查詢才能將文檔視為匹配項

$ GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
      //"should": [...] 或查詢
      //"must_not": [...] 都不是
    }
  }
}

組合查詢，查詢年齡為40並且不住在ID省的客戶賬戶

$ GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

bool過濾器

查詢餘額在20000到30000(包含)的客戶賬戶

$ GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}