ElasticSearch必知必會-基礎篇_ZenDei技術網路在線

定義：相同文檔結構（Mapping）文檔的結合由唯一索引名稱標定一個集群中有多個索引不同的索引代表不同的業務類型數據註意事項：索引名稱不支持大寫索引名稱最大支持255個字元長度欄位的名稱，支持大寫，不過建議全部統一小寫 ...

商業發展與職能技術部-體驗保障研發組康睿姚再毅李振劉斌王北永

說明：以下全部均基於eslaticsearch 8.1 版本

一.索引的定義

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/indices.html

索引的全局認知

ElasticSearch	Mysql
Index	Table
Type廢棄	Table廢棄
Document	Row
Field	Column
Mapping	Schema
Everything is indexed	Index
Query DSL	SQL
GET http://...	select * from
POST http://...	update table set ...
Aggregations	group by\sum\sum
cardinality	去重 distinct
reindex	數據遷移

索引的定義

定義：相同文檔結構（Mapping）文檔的結合由唯一索引名稱標定一個集群中有多個索引不同的索引代表不同的業務類型數據註意事項：索引名稱不支持大寫索引名稱最大支持255個字元長度欄位的名稱，支持大寫，不過建議全部統一小寫

索引的創建

index-settings 參數解析

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-modules.html

註意：靜態參數索引創建後，不再可以修改，動態參數可以修改思考：一、為什麼主分片創建後不可修改？ A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the defalue value userd for _routing is the document`s _id es中寫入數據，是根據上述的公式計算文檔應該存儲在哪個分片中，後續的文檔讀取也是根據這個公式，一旦分片數改變，數據也就找不到了簡單理解根據ID做Hash 然後再除以主分片數取餘，被除數改變，結果就不一樣了二、如果業務層面根據數據情況，確實需要擴展主分片數，那怎麼辦？ reindex 遷移數據到另外一個索引 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html

索引的基本操作

二.Mapping-Param之dynamic

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic.html

核心功能

自動檢測欄位類型後添加欄位也就是哪怕你沒有在es的mapping中定義該欄位，es也會動態的幫你檢測欄位類型

初識dynamic

// 刪除test01索引，保證這個索引現在是乾凈的
DELETE test01

// 不定義mapping，直接一條插入數據試試看,
POST test01/_doc/1
{
  "name":"kangrui10"
}

// 然後我們查看test01該索引的mapping結構 看看name這個欄位被定義成了什麼類型
// 由此可以看出，name一級為text類型，二級定義為keyword，但其實這並不是我們想要的結果，
// 我們業務查詢中name欄位並不會被分詞查詢，一般都是全匹配(and name = xxx)
// 以下的這種結果，我們想要實現全匹配 就需要 name.keyword = xxx  反而麻煩
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

dynamic的可選值

可選值	說明	解釋
true	New fields are added to the mapping (default).	創建mapping時，如果不指定dynamic的值，預設true，即如果你的欄位沒有收到指定類型，就會es幫你動態匹配欄位類型
false	New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly.	若設置為false，如果你的欄位沒有在es的mapping中創建，那麼新的欄位，一樣可以寫入，但是不能被查詢，mapping中也不會有這個欄位，也就是被寫入的欄位，不會被創建索引
strict	If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping.	若設置為strict，如果新的欄位，沒有在mapping中創建欄位，添加會直接報錯，生產環境推薦，更加嚴謹。示例如下,如要新增欄位，就必須手動的新增欄位

動態映射的弊端

欄位匹配相對準確，但不一定是用戶期望的
比如現在有一個text欄位，es只會給你設置為預設的standard分詞器，但我們一般需要的是ik中文分詞器
占用多餘的存儲空間
string類型匹配為text和keyword兩種類型，意味著會占用更多的存儲空間
mapping爆炸
如果不小心寫錯了查詢語句，get用成了put誤操作，就會錯誤創建很多欄位

三.Mapping-Param之doc_values

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html

核心功能

DocValue其實是Lucene在構建倒排索引時，會額外建立一個有序的正排索引（基於document => field value的映射列表） DocValue本質上是一個序列化的列式存儲，這個結構非常適用於聚合（aggregations）、排序（Sorting）、腳本（scripts access to field）等操作。而且，這種存儲方式也非常便於壓縮，特別是數字類型。這樣可以減少磁碟空間並且提高訪問速度。幾乎所有欄位類型都支持DocValue，除了text和annotated_text欄位。

何為正排索引

正排索引其實就是類似於資料庫表，通過id和數據進行關聯，通過搜索文檔id，來獲取對應的數據

doc_values可選值

true：預設值，預設開啟
false：需手動指定，設置為false後，sort、aggregate、access the field from script將會無法使用，但會節省磁碟空間

真題演練

// 創建一個索引，test03，欄位滿足以下條件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer
PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{
        "type": "integer"
      }
    }
  }
}

四.分詞器analyzers

ik中文分詞器安裝

https://github.com/medcl/elasticsearch-analysis-ik

何為倒排索引

數據索引化的過程

分詞器的分類

官網地址: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-analyzers.html

五.自定義分詞

自定義分詞器三段論

1.Character filters 字元過濾

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-charfilters.html 可配置0個或多個

HTML Strip Character Filter：用途：刪除HTML元素，如 ，並解碼HTML實體，如＆amp

返回值	含義
id	非同步檢索返回的唯一標識符
is_partial	當查詢不再運行時，指示再所有分片上搜索是成功還是失敗。在執行查詢時，is_partial=true
is_running	搜索是否仍然再執行
total	將在多少分片上執行搜索
successful	有多少分片已經成功完成搜索

Mapping Character Filter：用途：替換指定字元

Pattern Replace Character Filter：用途：基於正則表達式替換指定字元

2.Tokenizer 文本切為分詞

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenizers.html#_word_oriented_tokenizers 只能配置一個用分詞器對文本進行分詞

3.Token filters 分詞後再過濾

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenfilters.html 可配置0個或多個分詞後再加工，比如轉小寫、刪除某些特殊的停用詞、增加同義詞等

真題演練

有一個文檔，內容類似 dag & cat, 要求索引這個文檔，並且使用match_parase_query, 查詢dag & cat 或者 dag and cat,都能夠查到題目分析： 1.何為match_parase_query：match_phrase 會將檢索關鍵詞分詞。match_phrase的分詞結果必須在被檢索欄位的分詞中都包含，而且順序必須相同，而且預設必須都是連續的。 2.要實現 & 和 and 查詢結果要等價，那麼就需要自定義分詞器來實現了，定製化的需求 3.如何自定義一個分詞器：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-custom-analyzer.html 4.解法1核心使用功能點，Mapping Character Filter 5.解法2核心使用功能點，https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-synonym-tokenfilter.html

解法1

# 新建索引 PUT /test01 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "char_filter": [ "my_mappings_char_filter" ], "tokenizer": "standard", } }, "char_filter": { "my_mappings_char_filter": { "type": "mapping", "mappings": [ "& => and" ] } } } }, "mappings": { "properties": { "content":{ "type": "text", "analyzer": "my_analyzer" } } } } // 說明 // 三段論之Character filters，使用char_filter進行文本替換 // 三段論之Token filters，使用預設分詞器 // 三段論之Token filters，未設定 // 欄位content 使用自定義分詞器my_analyzer # 填充測試數據 PUT test01/_bulk {"index":{"_id":1}} {"content":"doc & cat"} {"index":{"_id":2}} {"content":"doc and cat"} # 執行測試,doc & cat || oc and cat 結果輸出都為兩條 POST test01/_search { "query": { "bool": { "must": [ { "match_phrase": { "content": "doc & cat" } } ] } } }

解法2

# 解題思路，將& 和 and 設定為同義詞，使用Token filters # 創建索引 PUT /test02 { "settings": { "analysis": { "analyzer": { "my_synonym_analyzer": { "tokenizer": "whitespace", "filter": [ "my_synonym" ] } }, "filter": { "my_synonym": { "type": "synonym", "lenient": true, "synonyms": [ "& => and" ] } } } }, "mappings": { "properties": { "content": { "type": "text", "analyzer": "my_synonym_analyzer" } } } } // 說明 // 三段論之Character filters，未設定 // 三段論之Token filters，使用whitespace空格分詞器，為什麼不用預設分詞器？因為預設分詞器會把&分詞後剔除了，就無法在去做分詞後的過濾操作了 // 三段論之Token filters，使用synony分詞後過濾器，對&和and做同義詞 // 欄位content 使用自定義分詞器my_synonym_analyzer # 填充測試數據 PUT test02/_bulk {"index":{"_id":1}} {"content":"doc & cat"} {"index":{"_id":2}} {"content":"doc and cat"} # 執行測試 POST test02/_search { "query": { "bool": { "must": [ { "match_phrase": { "content": "doc & cat" } } ] } } }

六.multi-fields

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/multi-fields.html

// 單欄位多類型,比如一個欄位我想設置兩種分詞器 PUT my-index-000001 { "mappings": { "properties": { "city": { "type": "text", "analyzer":"standard", "fields": { "fieldText": { "type": "text", "analyzer":"ik_smart", } } } } } }

七.runtime_field 運行時欄位

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime.html

產生背景

假如業務中需要根據某兩個數字類型欄位的差值來排序，也就是我需要一個不存在的欄位, 那麼此時應該怎麼辦？當然你可以刷數，新增一個差值結果欄位來實現，假如此時不允許你刷數新增欄位怎麼辦？

解決方案

應用場景

在不重新建立索引的情況下，向現有文檔新增欄位

在不瞭解數據結構的情況下處理數據

在查詢時覆蓋從原索引欄位返回的值

為特定用途定義欄位而不修改底層架構

功能特性

Lucene完全無感知，因沒有被索引化，沒有doc_values

不支持評分，因為沒有倒排索引

打破傳統先定義後使用的方式

能阻止mapping爆炸

增加了API的靈活性

註意，會使得搜索變慢

實際使用

運行時檢索指定，即檢索環節可使用（也就是哪怕mapping中沒有這個欄位，我也可以查詢）

動態或靜態mapping指定，即mapping環節可使用（也就是在mapping中添加一個運行時的欄位）

真題演練1

# 假定有以下索引和數據 PUT test03 { "mappings": { "properties": { "emotion": { "type": "integer" } } } } POST test03/_bulk {"index":{"_id":1}} {"emotion":2} {"index":{"_id":2}} {"emotion":5} {"index":{"_id":3}} {"emotion":10} {"index":{"_id":4}} {"emotion":3} # 要求：emotion > 5, 返回emotion_falg = '1', # 要求：emotion < 5, 返回emotion_falg = '-1', # 要求：emotion = 5, 返回emotion_falg = '0',

解法1

檢索時指定運行時欄位: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html 該欄位本質上是不存在的，所以需要檢索時要加上 fields *

GET test03/_search { "fields": [ "*" ], "runtime_mappings": { "emotion_falg": { "type": "keyword", "script": { "source": """ if(doc['emotion'].value>5)emit('1'); if(doc['emotion'].value<5)emit('-1'); if(doc['emotion'].value==5)emit('0'); """ } } } }

解法2

創建索引時指定運行時欄位：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-mapping-fields.html 該方式支持通過運行時欄位做檢索

# 創建索引並指定運行時欄位 PUT test03_01 { "mappings": { "runtime": { "emotion_falg": { "type": "keyword", "script": { "source": """ if(doc['emotion'].value>5)emit('1'); if(doc['emotion'].value<5)emit('-1'); if(doc['emotion'].value==5)emit('0'); """ } } }, "properties": { "emotion": { "type": "integer" } } } } # 導入測試數據 POST test03_01/_bulk {"index":{"_id":1}} {"emotion":2} {"index":{"_id":2}} {"emotion":5} {"index":{"_id":3}} {"emotion":10} {"index":{"_id":4}} {"emotion":3} # 查詢測試 GET test03_01/_search { "fields": [ "*" ] }

真題演練2

# 有以下索引和數據 PUT test04 { "mappings": { "properties": { "A":{ "type": "long" }, "B":{ "type": "long" } } } } PUT task04/_bulk {"index":{"_id":1}} {"A":100,"B":2} {"index":{"_id":2}} {"A":120,"B":2} {"index":{"_id":3}} {"A":120,"B":25} {"index":{"_id":4}} {"A":21,"B":25} # 需求：在task04索引里，創建一個runtime欄位，其值是A-B，名稱為A_B；創建一個range聚合，分為三級：小於0，0-100，100以上；返迴文檔數 // 使用知識點： // 1.檢索時指定運行時欄位: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html // 2.範圍聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html

解法

# 結果測試 GET task04/_search { "fields": [ "*" ], "size": 0, "runtime_mappings": { "A_B": { "type": "long", "script": { "source": """ emit(doc['A'].value - doc['B'].value); """ } } }, "aggs": { "price_ranges_A_B": { "range": { "field": "A_B", "ranges": [ { "to": 0 }, { "from": 0, "to": 100 }, { "from": 100 } ] } } } }

八.Search-highlighted

highlighted語法初識

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/highlighting.html

九.Search-Order

Order語法初識

官網文檔地址： https://www.elastic.co/guide/en/elasticsearch/reference/8.1/sort-search-results.html

// 註意：text類型預設是不能排或聚合的，如果非要排序或聚合，需要開啟fielddata GET /kibana_sample_data_ecommerce/_search { "query": { "match": { "customer_last_name": "wood" } }, "highlight": { "number_of_fragments": 3, "fragment_size": 150, "fields": { "customer_last_name": { "pre_tags": [ "<em>" ], "post_tags": [ "</em>" ] } } }, "sort": [ { "currency": { "order": "desc" }, "_score": { "order": "asc" } } ] }

十.Search-Page

page語法初識

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/paginate-search-results.html

# 註意 from的起始值是 0 不是 1 GET kibana_sample_data_ecommerce/_search { "from": 5, "size": 20, "query": { "match": { "customer_last_name": "wood" } } }

真題演練1

# 題目 In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#" return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40' # highlight 處理 text_entry 欄位；關鍵詞 Hamlet 高亮 # page分頁：from：40；size:20 # speech_number：倒序 POST test09/_search { "from": 40, "size": 20, "query": { "bool": { "must": [ { "match": { "text_entry": "Hamlet" } } ] } }, "highlight": { "fields": { "text_entry": { "pre_tags": [ "#aaa#" ], "post_tags": [ "#bbb#" ] } } }, "sort": [ { "speech_number.keyword": { "order": "desc" } } ] }

十一.Search-AsyncSearch

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html

發行版本

7.7.0

適用場景

允許用戶在非同步搜索結果時可以檢索，從而消除了僅在查詢完成後才等待最終響應的情況

常用命令

執行非同步檢索

POST /sales*/_async_search?size=0

查看非同步檢索

GET /_async_search/id值

查看非同步檢索狀態

GET /_async_search/id值

刪除、終止非同步檢索

DELETE /_async_search/id值

非同步查詢結果說明

返回值含義

id 非同步檢索返回的唯一標識符

is_partial 當查詢不再運行時，指示再所有分片上搜索是成功還是失敗。在執行查詢時，is_partial=true

is_running 搜索是否仍然再執行

total 將在多少分片上執行搜索

successful 有多少分片已經成功完成搜索

十二.Aliases索引別名

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/aliases.html

Aliases的作用

在ES中，索引別名（index aliases）就像一個快捷方式或軟連接，可以指向一個或多個索引。別名帶給我們極大的靈活性，我們可以使用索引別名實現以下功能：

在一個運行中的ES集群中無縫的切換一個索引到另一個索引上（無需停機）

分組多個索引，比如按月創建的索引，我們可以通過別名構造出一個最近3個月的索引

查詢一個索引裡面的部分數據構成一個類似資料庫的視圖（views

假設沒有別名，如何處理多索引的檢索

方式1：POST index_01,index_02.index_03/_search 方式2：POST index*/search

創建別名的三種方式

創建索引的同時指定別名

# 指定test05的別名為 test05_aliases PUT test05 { "mappings": { "properties": { "name":{ "type": "keyword" } } }, "aliases": { "test05_aliases": {} } }

使用索引模板的方式指定別名

PUT _index_template/template_1 { "index_patterns": ["te*", "bar*"], "template": { "settings": { "number_of_shards": 1 }, "mappings": { "_source": { "enabled": true }, "properties": { "host_name": { "type": "keyword" }, "created_at": { "type": "date", "format": "EEE MMM dd HH:mm:ss Z yyyy" } } }, "aliases": { "mydata": { } } }, "priority": 500, "composed_of": ["component_template1", "runtime_component_template"], "version": 3, "_meta": { "description": "my custom" } }

對已有的索引創建別名

POST _aliases { "actions": [ { "add": { "index": "logs-nginx.access-prod", "alias": "logs" } } ] }

刪除別名

POST _aliases { "actions": [ { "remove": { "index": "logs-nginx.access-prod", "alias": "logs" } } ] }

真題演練1

# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners # 為'accounts-row'定義一個索引別名，稱為'accounts-male':應用一個過濾器，只顯示男性賬戶所有者 POST _aliases { "actions": [ { "add": { "index": "accounts-row", "alias": "accounts-male", "filter": { "bool": { "filter": [ { "term": { "gender.keyword": "male" } } ] } } } } ] }

十三.Search-template

官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-template.html

功能特點

模板接受在運行時指定參數。搜索模板存儲在伺服器端，可以在不更改客戶端代碼的情況下進行修改。

初識search-template

# 創建檢索模板 PUT _scripts/my-search-template { "script": { "lang": "mustache", "source": { "query": { "match": { "{{query_key}}": "{{query_value}}" } }, "from": "{{from}}", "size": "{{size}}" } } } # 使用檢索模板查詢 GET my-index/_search/template { "id": "my-search-template", "params": { "query_key": "your filed", "query_value": "your filed value", "from": 0, "size": 10 } }

索引模板的操作

創建索引模板

PUT _scripts/my-search-template { "script": { "lang": "mustache", "source": { "query": { "match": { "message": "{{query_string}}" } }, "from": "{{from}}", "size": "{{size}}" }, "params": { "query_string": "My query string" } } }

驗證索引模板

POST _render/template { "id": "my-search-template", "params": { "query_string": "hello world", "from": 20, "size": 10 } }

執行檢索模板

GET my-index/_search/template { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 } }

獲取全部檢索模板

GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts

刪除檢索模板

DELETE _scripts/my-search-templateath=metadata.stored_scripts

十四.Search-dsl 簡單檢索

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl.html

檢索選型

檢索分類

自定義評分

如何自定義評分

1.index Boost索引層面修改相關性

// 一批數據里，有不同的標簽，數據結構一致，不同的標簽存儲到不同的索引（A、B、C），最後要嚴格按照標簽來分類展示的話，用什麼查詢比較好? // 要求：先展示A類，然後B類，然後C類 # 測試數據如下 put /index_a_123/_doc/1 { "title":"this is index_a..." } put /index_b_123/_doc/1 { "title":"this is index_b..." } put /index_c_123/_doc/1 { "title":"this is index_c..." } # 普通不指定的查詢方式，該查詢方式下，返回的三條結果數據評分是相同的 POST index_*_123/_search { "query": { "bool": { "must": [ { "match": { "title": "this" } } ] } } } 官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html indices_boost # 也就是索引層面提升權重 POST index_*_123/_search { "indices_boost": [ { "index_a_123": 10 }, { "index_b_123": 5 }, { "index_c_123": 1 } ], "query": { "bool": { "must": [ { "match": { "title": "this" } } ] } } }

2.boosting 修改文檔相關性

某索引index_a有多個欄位，要求實現如下的查詢： 1）針對欄位title，滿足'ssas'或者'sasa’。 2）針對欄位tags（數組欄位），如果tags欄位包含'pingpang', 則提升評分。要求：寫出實現的DSL？ # 測試數據如下 put index_a/_bulk {"index":{"_id":1}} {"title":"ssas","tags":"basketball"} {"index":{"_id":2}} {"title":"sasa","tags":"pingpang; football"} # 解法1 POST index_a/_search { "query": { "bool": { "must": [ { "bool": { "should": [ { "match": { "title": "ssas" } }, { "match": { "title": "sasa" } } ] } } ], "should": [ { "match": { "tags": { "query": "pingpang", "boost": 1 } } } ] } } } # 解法2 // https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html POST index_a/_search { "query": { "bool": { "should": [ { "function_score": { "query": { "match": { "tags": { "query": "pingpang" } } }, "boost": 1 } } ], "must": [ { "bool": { "should": [ { "match": { "title": "ssas" } }, { "match": { "title": "sasa" } } ] } } ] } } }

3.negative_boost降低相關性

對於某些結果不滿意，但又不想通過 must_not 排除掉，可以考慮可以考慮boosting query的negative_boost。即：降低評分 negative_boost (Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query. 官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html POST index_a/_search { "query": { "boosting": { "positive": { "term": { "tags": "football" } }, "negative": { "term": { "tags": "pingpang" } }, "negative_boost": 0.5 } } }

4.function_score 自定義評分

如何同時根據銷量和瀏覽人數進行相關度提升？問題描述：針對商品，例如有想要有一個提升相關度的計算，同時針對銷量和瀏覽人數？例如oldScore*(銷量+瀏覽人數) ************************** 商品銷量瀏覽人數 A 10 10 B 20 20 C 30 30 ************************** # 示例數據如下 put goods_index/_bulk {"index":{"_id":1}} {"name":"A","sales_count":10,"view_count":10} {"index":{"_id":2}} {"name":"B","sales_count":20,"view_count":20} {"index":{"_id":3}} {"name":"C","sales_count":30,"view_count":30} 官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html 知識點：script_score POST goods_index/_search { "query": { "function_score": { "query": { "match_all": {} }, "script_score": { "script": { "source": "_score * (doc['sales_count'].value+doc['view_count'].value)" } } } } }

十五.Search-del Bool複雜檢索

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html

基本語法

真題演練

寫一個查詢，要求某個關鍵字再文檔的四個欄位中至少包含兩個以上功能點：bool 查詢，should / minimum_should_match 1.檢索的bool查詢 2.細節點 minimum_should_match 註意：minimum_should_match 當有其他子句的時候，預設值為0，當沒有其他子句的時候預設值為1 POST test_index/_search { "query": { "bool": { "should": [ { "match": { "filed1": "kr" } }, { "match": { "filed2": "kr" } }, { "match": { "filed3": "kr" } }, { "match": { "filed4": "kr" } } ], "minimum_should_match": 2 } } }

十六.Search-Aggregations

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html

聚合分類

分桶聚合（bucket）

terms

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html # 按照作者統計文檔數 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_user": { "terms": { "field": "user", "size": 1 } } } }

date_histogram

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html # 按照up_time 按月進行統計 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_up_time": { "date_histogram": { "field": "up_time", "calendar_interval": "month" } } } }

指標聚合（metrics）

Max

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html # 獲取up_time最大的 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "agg_max_up_time": { "max": { "field": "up_time" } } } }

Top_hits

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html # 根據user聚合只取一個聚合結果，並且獲取命中數據的詳情前3條，並按照指定欄位排序 POST bilili_elasticsearch/_search { "size": 0, "aggs": { "terms_agg_user": { "terms": { "field": "user", "size": 1 }, "aggs": { "top_user_hits": { "top_hits": { "_source": { "includes": [ "video_time", "title", "see", "user", "up_time" ] }, "sort": [ { "see":{ "order": "desc" } } ], "size": 3 } } } } } } // 返回結果如下 { "took" : 91, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "terms_agg_user" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 975, "buckets" : [ { "key" : "Elastic搜索", "doc_count" : 25, "top_user_hits" : { "hits" : { "total" : { "value" : 25, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bilili_elasticsearch", "_id" : "5ccCVoQBUyqsIDX6wIcm", "_score" : null, "_source" : { "video_time" : "03:45", "see" : "92", "up_time" : "2021-03-19", "title" : "Elastic 社區大會2021: 用加 Gatling 進行Elasticsearch的負載測試，寓教於樂。", "user" : "Elastic搜索" }, "sort" : [ "92" ] }, { "_index" : "bilili_elasticsearch", "_id" : "8scCVoQBUyqsIDX6wIgn", "_score" : null, "_source" : { "video_time" : "10:18", "see" : "79", "up_time" : "2020-10-20", "title" : "為Elasticsearch啟動htpps訪問", "user" : "Elastic搜索" }, "sort" : [ "79" ] }, { "_index" : "bilili_elasticsearch", "_id" : "7scCVoQBUyqsIDX6wIcm", "_score" : null, "_source" : { "video_time" : "04:41", "see" : "71", "up_time" : "2021-03-19", "title" : "Elastic 社區大會2021: Elasticsearch作為一個地理空間的資料庫", "user" : "Elastic搜索" }, "sort" : [ "71" ] } ] } } } ] } } }

子聚合（Pipeline）

Pipeline：基於聚合的聚合官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline.html

bucket_selector

官網文檔地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-bucket-selector-aggregation.html

# 根據order_date按月分組，並且求銷售總額大於1000 POST kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "date_his_aggs": { "date_histogram": { "field": "order_date", "calendar_interval": "month" }, "aggs": { "sum_aggs": { "sum": { "field": "total_unique_products" } }, "sales_bucket_filter": { "bucket_selector": { "buckets_path": { "totalSales": "sum_aggs" }, "script": "params.totalSales > 1000" } } } } } }

真題演練

earthquakes索引中包含了過去30個月的地震信息，請通過一句查詢，獲取以下信息 l 過去30個月，每個月的平均 mag l 過去30個月里，平均mag最高的一個月及其平均mag l 搜索不能返回任何文檔 max_bucket 官網地址：https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html POST earthquakes/_search { "size": 0, "query": { "range": { "time": { "gte": "now-30M/d", "lte": "now" } } }, "aggs": { "agg_time_his": { "date_histogram": { "field": "time", "calendar_interval": "month" }, "aggs": { "avg_aggs": { "avg": { "field": "mag" } } } }, "max_mag_sales": { "max_bucket": { "buckets_path": "agg_time_his>avg_aggs" } } } }

ElasticSearch必知必會-基礎篇

商業發展與職能技術部-體驗保障研發組 康睿 姚再毅 李振 劉斌 王北永

一.索引的定義

索引的全局認知

索引的定義

索引的創建

index-settings 參數解析

索引的基本操作

二.Mapping-Param之dynamic

核心功能

初識dynamic

dynamic的可選值

動態映射的弊端

三.Mapping-Param之doc_values

核心功能

何為正排索引

doc_values可選值

真題演練

四.分詞器analyzers

ik中文分詞器安裝

何為倒排索引

數據索引化的過程

分詞器的分類

五.自定義分詞

自定義分詞器三段論

1.Character filters 字元過濾

2.Tokenizer 文本切為分詞

3.Token filters 分詞後再過濾

真題演練

解法1

解法2

六.multi-fields

七.runtime_field 運行時欄位

產生背景

解決方案

應用場景

功能特性

實際使用

真題演練1

解法1

解法2

真題演練2

解法

八.Search-highlighted

highlighted語法初識

九.Search-Order

Order語法初識

十.Search-Page

page語法初識

真題演練1

十一.Search-AsyncSearch

發行版本

適用場景

常用命令

非同步查詢結果說明

十二.Aliases索引別名

Aliases的作用

假設沒有別名，如何處理多索引的檢索

創建別名的三種方式

刪除別名

真題演練1

十三.Search-template

功能特點

初識search-template

索引模板的操作

創建索引模板

驗證索引模板

執行檢索模板

獲取全部檢索模板

刪除檢索模板

十四.Search-dsl 簡單檢索

檢索選型

檢索分類

自定義評分

如何自定義評分

1.index Boost索引層面修改相關性

2.boosting 修改文檔相關性

3.negative_boost降低相關性

4.function_score 自定義評分

十五.Search-del Bool複雜檢索

商業發展與職能技術部-體驗保障研發組康睿姚再毅李振劉斌王北永

指標聚合（metrics）

子聚合（Pipeline）