定義: 相同文檔結構(Mapping)文檔的結合 由唯一索引名稱標定 一個集群中有多個索引 不同的索引代表不同的業務類型數據 註意事項: 索引名稱不支持大寫 索引名稱最大支持255個字元長度 欄位的名稱,支持大寫,不過建議全部統一小寫 ...
商業發展與職能技術部-體驗保障研發組 康睿 姚再毅 李振 劉斌 王北永
說明:以下全部均基於eslaticsearch 8.1 版本
一.索引的定義
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/indices.html
索引的全局認知
ElasticSearch | Mysql |
---|---|
Index | Table |
Type廢棄 | Table廢棄 |
Document | Row |
Field | Column |
Mapping | Schema |
Everything is indexed | Index |
Query DSL | SQL |
GET http://... | select * from |
POST http://... | update table set ... |
Aggregations | group by\sum\sum |
cardinality | 去重 distinct |
reindex | 數據遷移 |
索引的定義
定義: 相同文檔結構(Mapping)文檔的結合 由唯一索引名稱標定 一個集群中有多個索引 不同的索引代表不同的業務類型數據 註意事項: 索引名稱不支持大寫 索引名稱最大支持255個字元長度 欄位的名稱,支持大寫,不過建議全部統一小寫
索引的創建
index-settings 參數解析
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/index-modules.html
註意: 靜態參數索引創建後,不再可以修改,動態參數可以修改 思考: 一、為什麼主分片創建後不可修改? A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the defalue value userd for _routing is the document`s _id es中寫入數據,是根據上述的公式計算文檔應該存儲在哪個分片中,後續的文檔讀取也是根據這個公式,一旦分片數改變,數據也就找不到了 簡單理解 根據ID做Hash 然後再 除以 主分片數 取餘,被除數改變,結果就不一樣了 二、如果業務層面根據數據情況,確實需要擴展主分片數,那怎麼辦? reindex 遷移數據到另外一個索引 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/docs-reindex.html
索引的基本操作
二.Mapping-Param之dynamic
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/dynamic.html
核心功能
自動檢測欄位類型後添加欄位 也就是哪怕你沒有在es的mapping中定義該欄位,es也會動態的幫你檢測欄位類型
初識dynamic
// 刪除test01索引,保證這個索引現在是乾凈的
DELETE test01
// 不定義mapping,直接一條插入數據試試看,
POST test01/_doc/1
{
"name":"kangrui10"
}
// 然後我們查看test01該索引的mapping結構 看看name這個欄位被定義成了什麼類型
// 由此可以看出,name一級為text類型,二級定義為keyword,但其實這並不是我們想要的結果,
// 我們業務查詢中name欄位並不會被分詞查詢,一般都是全匹配(and name = xxx)
// 以下的這種結果,我們想要實現全匹配 就需要 name.keyword = xxx 反而麻煩
GET test01/_mapping
{
"test01" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
dynamic的可選值
可選值 | 說明 | 解釋 |
---|---|---|
true | New fields are added to the mapping (default). | 創建mapping時,如果不指定dynamic的值,預設true,即如果你的欄位沒有收到指定類型,就會es幫你動態匹配欄位類型 |
false | New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly. | 若設置為false,如果你的欄位沒有在es的mapping中創建,那麼新的欄位,一樣可以寫入,但是不能被查詢,mapping中也不會有這個欄位,也就是被寫入的欄位,不會被創建索引 |
strict | If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping. | 若設置為strict,如果新的欄位,沒有在mapping中創建欄位,添加會直接報錯,生產環境推薦,更加嚴謹。示例如下,如要新增欄位,就必須手動的新增欄位 |
動態映射的弊端
- 欄位匹配相對準確,但不一定是用戶期望的
- 比如現在有一個text欄位,es只會給你設置為預設的standard分詞器,但我們一般需要的是ik中文分詞器
- 占用多餘的存儲空間
- string類型匹配為text和keyword兩種類型,意味著會占用更多的存儲空間
- mapping爆炸
- 如果不小心寫錯了查詢語句,get用成了put誤操作,就會錯誤創建很多欄位
三.Mapping-Param之doc_values
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/doc-values.html
核心功能
DocValue其實是Lucene在構建倒排索引時,會額外建立一個有序的正排索引(基於document => field value的映射列表) DocValue本質上是一個序列化的 列式存儲,這個結構非常適用於聚合(aggregations)、排序(Sorting)、腳本(scripts access to field)等操作。而且,這種存儲方式也非常便於壓縮,特別是數字類型。這樣可以減少磁碟空間並且提高訪問速度。 幾乎所有欄位類型都支持DocValue,除了text和annotated_text欄位。
何為正排索引
正排索引其實就是類似於資料庫表,通過id和數據進行關聯,通過搜索文檔id,來獲取對應的數據
doc_values可選值
- true:預設值,預設開啟
- false:需手動指定,設置為false後,sort、aggregate、access the field from script將會無法使用,但會節省磁碟空間
真題演練
// 創建一個索引,test03,欄位滿足以下條件
// 1. speaker: keyword
// 2. line_id: keyword and not aggregateable
// 3. speech_number: integer
PUT test03
{
"mappings": {
"properties": {
"speaker": {
"type": "keyword"
},
"line_id":{
"type": "keyword",
"doc_values": false
},
"speech_number":{
"type": "integer"
}
}
}
}
四.分詞器analyzers
ik中文分詞器安裝
何為倒排索引
數據索引化的過程
分詞器的分類
官網地址: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-analyzers.html
五.自定義分詞
自定義分詞器三段論
1.Character filters 字元過濾
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-charfilters.html 可配置0個或多個
HTML Strip Character Filter:用途:刪除HTML元素,如 ,並解 碼HTML實體,如&amp
Mapping Character Filter:用途:替換指定字元
Pattern Replace Character Filter:用途:基於正則表達式替換指定字元
2.Tokenizer 文本切為分詞
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenizers.html#_word_oriented_tokenizers 只能配置一個 用分詞器對文本進行分詞
3.Token filters 分詞後再過濾
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-tokenfilters.html 可配置0個或多個 分詞後再加工,比如轉小寫、刪除某些特殊的停用詞、增加同義詞等
真題演練
有一個文檔,內容類似 dag & cat, 要求索引這個文檔,並且使用match_parase_query, 查詢dag & cat 或者 dag and cat,都能夠查到 題目分析: 1.何為match_parase_query:match_phrase 會將檢索關鍵詞分詞。match_phrase的分詞結果必須在被檢索欄位的分詞中都包含,而且順序必須相同,而且預設必須都是連續的。 2.要實現 & 和 and 查詢結果要等價,那麼就需要自定義分詞器來實現了,定製化的需求 3.如何自定義一個分詞器:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-custom-analyzer.html 4.解法1核心使用功能點,Mapping Character Filter 5.解法2核心使用功能點,https://www.elastic.co/guide/en/elasticsearch/reference/8.1/analysis-synonym-tokenfilter.html
解法1
# 新建索引
PUT /test01
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"char_filter": [
"my_mappings_char_filter"
],
"tokenizer": "standard",
}
},
"char_filter": {
"my_mappings_char_filter": {
"type": "mapping",
"mappings": [
"& => and"
]
}
}
}
},
"mappings": {
"properties": {
"content":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
// 說明
// 三段論之Character filters,使用char_filter進行文本替換
// 三段論之Token filters,使用預設分詞器
// 三段論之Token filters,未設定
// 欄位content 使用自定義分詞器my_analyzer
# 填充測試數據
PUT test01/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}
# 執行測試,doc & cat || oc and cat 結果輸出都為兩條
POST test01/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"content": "doc & cat"
}
}
]
}
}
}
解法2
# 解題思路,將& 和 and 設定為同義詞,使用Token filters
# 創建索引
PUT /test02
{
"settings": {
"analysis": {
"analyzer": {
"my_synonym_analyzer": {
"tokenizer": "whitespace",
"filter": [
"my_synonym"
]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [
"& => and"
]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_synonym_analyzer"
}
}
}
}
// 說明
// 三段論之Character filters,未設定
// 三段論之Token filters,使用whitespace空格分詞器,為什麼不用預設分詞器?因為預設分詞器會把&分詞後剔除了,就無法在去做分詞後的過濾操作了
// 三段論之Token filters,使用synony分詞後過濾器,對&和and做同義詞
// 欄位content 使用自定義分詞器my_synonym_analyzer
# 填充測試數據
PUT test02/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}
# 執行測試
POST test02/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"content": "doc & cat"
}
}
]
}
}
}
六.multi-fields
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/multi-fields.html
// 單欄位多類型,比如一個欄位我想設置兩種分詞器
PUT my-index-000001
{
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer":"standard",
"fields": {
"fieldText": {
"type": "text",
"analyzer":"ik_smart",
}
}
}
}
}
}
七.runtime_field 運行時欄位
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime.html
產生背景
假如業務中需要根據某兩個數字類型欄位的差值來排序,也就是我需要一個不存在的欄位, 那麼此時應該怎麼辦? 當然你可以刷數,新增一個差值結果欄位來實現,假如此時不允許你刷數新增欄位怎麼辦?
解決方案
應用場景
- 在不重新建立索引的情況下,向現有文檔新增欄位
- 在不瞭解數據結構的情況下處理數據
- 在查詢時覆蓋從原索引欄位返回的值
- 為特定用途定義欄位而不修改底層架構
功能特性
- Lucene完全無感知,因沒有被索引化,沒有doc_values
- 不支持評分,因為沒有倒排索引
- 打破傳統先定義後使用的方式
- 能阻止mapping爆炸
- 增加了API的靈活性
- 註意,會使得搜索變慢
實際使用
- 運行時檢索指定,即檢索環節可使用(也就是哪怕mapping中沒有這個欄位,我也可以查詢)
- 動態或靜態mapping指定,即mapping環節可使用(也就是在mapping中添加一個運行時的欄位)
真題演練1
# 假定有以下索引和數據
PUT test03
{
"mappings": {
"properties": {
"emotion": {
"type": "integer"
}
}
}
}
POST test03/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 要求:emotion > 5, 返回emotion_falg = '1',
# 要求:emotion < 5, 返回emotion_falg = '-1',
# 要求:emotion = 5, 返回emotion_falg = '0',
解法1
檢索時指定運行時欄位: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html 該欄位本質上是不存在的,所以需要檢索時要加上 fields *
GET test03/_search
{
"fields": [
"*"
],
"runtime_mappings": {
"emotion_falg": {
"type": "keyword",
"script": {
"source": """
if(doc['emotion'].value>5)emit('1');
if(doc['emotion'].value<5)emit('-1');
if(doc['emotion'].value==5)emit('0');
"""
}
}
}
}
解法2
創建索引時指定運行時欄位:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-mapping-fields.html 該方式支持通過運行時欄位做檢索
# 創建索引並指定運行時欄位
PUT test03_01
{
"mappings": {
"runtime": {
"emotion_falg": {
"type": "keyword",
"script": {
"source": """
if(doc['emotion'].value>5)emit('1');
if(doc['emotion'].value<5)emit('-1');
if(doc['emotion'].value==5)emit('0');
"""
}
}
},
"properties": {
"emotion": {
"type": "integer"
}
}
}
}
# 導入測試數據
POST test03_01/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 查詢測試
GET test03_01/_search
{
"fields": [
"*"
]
}
真題演練2
# 有以下索引和數據
PUT test04
{
"mappings": {
"properties": {
"A":{
"type": "long"
},
"B":{
"type": "long"
}
}
}
}
PUT task04/_bulk
{"index":{"_id":1}}
{"A":100,"B":2}
{"index":{"_id":2}}
{"A":120,"B":2}
{"index":{"_id":3}}
{"A":120,"B":25}
{"index":{"_id":4}}
{"A":21,"B":25}
# 需求:在task04索引里,創建一個runtime欄位,其值是A-B,名稱為A_B; 創建一個range聚合,分為三級:小於0,0-100,100以上;返迴文檔數
// 使用知識點:
// 1.檢索時指定運行時欄位: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
// 2.範圍聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html
解法
# 結果測試
GET task04/_search
{
"fields": [
"*"
],
"size": 0,
"runtime_mappings": {
"A_B": {
"type": "long",
"script": {
"source": """
emit(doc['A'].value - doc['B'].value);
"""
}
}
},
"aggs": {
"price_ranges_A_B": {
"range": {
"field": "A_B",
"ranges": [
{ "to": 0 },
{ "from": 0, "to": 100 },
{ "from": 100 }
]
}
}
}
}
八.Search-highlighted
highlighted語法初識
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/highlighting.html
九.Search-Order
Order語法初識
官網文檔地址: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/sort-search-results.html
// 註意:text類型預設是不能排或聚合的,如果非要排序或聚合,需要開啟fielddata
GET /kibana_sample_data_ecommerce/_search
{
"query": {
"match": {
"customer_last_name": "wood"
}
},
"highlight": {
"number_of_fragments": 3,
"fragment_size": 150,
"fields": {
"customer_last_name": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
]
}
}
},
"sort": [
{
"currency": {
"order": "desc"
},
"_score": {
"order": "asc"
}
}
]
}
十.Search-Page
page語法初識
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/paginate-search-results.html
# 註意 from的起始值是 0 不是 1
GET kibana_sample_data_ecommerce/_search
{
"from": 5,
"size": 20,
"query": {
"match": {
"customer_last_name": "wood"
}
}
}
真題演練1
# 題目
In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#"
return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40'
# highlight 處理 text_entry 欄位 ; 關鍵詞 Hamlet 高亮
# page分頁:from:40;size:20
# speech_number:倒序
POST test09/_search
{
"from": 40,
"size": 20,
"query": {
"bool": {
"must": [
{
"match": {
"text_entry": "Hamlet"
}
}
]
}
},
"highlight": {
"fields": {
"text_entry": {
"pre_tags": [
"#aaa#"
],
"post_tags": [
"#bbb#"
]
}
}
},
"sort": [
{
"speech_number.keyword": {
"order": "desc"
}
}
]
}
十一.Search-AsyncSearch
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/async-search.html
發行版本
7.7.0
適用場景
允許用戶在非同步搜索結果時可以檢索,從而消除了僅在查詢完成後才等待最終響應的情況
常用命令
- 執行非同步檢索
- POST /sales*/_async_search?size=0
- 查看非同步檢索
- GET /_async_search/id值
- 查看非同步檢索狀態
- GET /_async_search/id值
- 刪除、終止非同步檢索
- DELETE /_async_search/id值
非同步查詢結果說明
返回值 | 含義 |
---|---|
id | 非同步檢索返回的唯一標識符 |
is_partial | 當查詢不再運行時,指示再所有分片上搜索是成功還是失敗。在執行查詢時,is_partial=true |
is_running | 搜索是否仍然再執行 |
total | 將在多少分片上執行搜索 |
successful | 有多少分片已經成功完成搜索 |
十二.Aliases索引別名
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/aliases.html
Aliases的作用
在ES中,索引別名(index aliases)就像一個快捷方式或軟連接,可以指向一個或多個索引。別名帶給我們極大的靈活性,我們可以使用索引別名實現以下功能:
- 在一個運行中的ES集群中無縫的切換一個索引到另一個索引上(無需停機)
- 分組多個索引,比如按月創建的索引,我們可以通過別名構造出一個最近3個月的索引
- 查詢一個索引裡面的部分數據構成一個類似資料庫的視圖(views
假設沒有別名,如何處理多索引的檢索
方式1:POST index_01,index_02.index_03/_search 方式2:POST index*/search
創建別名的三種方式
- 創建索引的同時指定別名
# 指定test05的別名為 test05_aliases
PUT test05
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
}
}
},
"aliases": {
"test05_aliases": {}
}
}
- 使用索引模板的方式指定別名
PUT _index_template/template_1
{
"index_patterns": ["te*", "bar*"],
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
},
"aliases": {
"mydata": { }
}
},
"priority": 500,
"composed_of": ["component_template1", "runtime_component_template"],
"version": 3,
"_meta": {
"description": "my custom"
}
}
- 對已有的索引創建別名
POST _aliases
{
"actions": [
{
"add": {
"index": "logs-nginx.access-prod",
"alias": "logs"
}
}
]
}
刪除別名
POST _aliases
{
"actions": [
{
"remove": {
"index": "logs-nginx.access-prod",
"alias": "logs"
}
}
]
}
真題演練1
# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners
# 為'accounts-row'定義一個索引別名,稱為'accounts-male':應用一個過濾器,只顯示男性賬戶所有者
POST _aliases
{
"actions": [
{
"add": {
"index": "accounts-row",
"alias": "accounts-male",
"filter": {
"bool": {
"filter": [
{
"term": {
"gender.keyword": "male"
}
}
]
}
}
}
}
]
}
十三.Search-template
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-template.html
功能特點
模板接受在運行時指定參數。搜索模板存儲在伺服器端,可以在不更改客戶端代碼的情況下進行修改。
初識search-template
# 創建檢索模板
PUT _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"{{query_key}}": "{{query_value}}"
}
},
"from": "{{from}}",
"size": "{{size}}"
}
}
}
# 使用檢索模板查詢
GET my-index/_search/template
{
"id": "my-search-template",
"params": {
"query_key": "your filed",
"query_value": "your filed value",
"from": 0,
"size": 10
}
}
索引模板的操作
創建索引模板
PUT _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"message": "{{query_string}}"
}
},
"from": "{{from}}",
"size": "{{size}}"
},
"params": {
"query_string": "My query string"
}
}
}
驗證索引模板
POST _render/template
{
"id": "my-search-template",
"params": {
"query_string": "hello world",
"from": 20,
"size": 10
}
}
執行檢索模板
GET my-index/_search/template
{
"id": "my-search-template",
"params": {
"query_string": "hello world",
"from": 0,
"size": 10
}
}
獲取全部檢索模板
GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts
刪除檢索模板
DELETE _scripts/my-search-templateath=metadata.stored_scripts
十四.Search-dsl 簡單檢索
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl.html
檢索選型
檢索分類
自定義評分
如何自定義評分
1.index Boost索引層面修改相關性
// 一批數據里,有不同的標簽,數據結構一致,不同的標簽存儲到不同的索引(A、B、C),最後要嚴格按照標簽來分類展示的話,用什麼查詢比較好?
// 要求:先展示A類,然後B類,然後C類
# 測試數據如下
put /index_a_123/_doc/1
{
"title":"this is index_a..."
}
put /index_b_123/_doc/1
{
"title":"this is index_b..."
}
put /index_c_123/_doc/1
{
"title":"this is index_c..."
}
# 普通不指定的查詢方式,該查詢方式下,返回的三條結果數據評分是相同的
POST index_*_123/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "this"
}
}
]
}
}
}
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
indices_boost
# 也就是索引層面提升權重
POST index_*_123/_search
{
"indices_boost": [
{
"index_a_123": 10
},
{
"index_b_123": 5
},
{
"index_c_123": 1
}
],
"query": {
"bool": {
"must": [
{
"match": {
"title": "this"
}
}
]
}
}
}
2.boosting 修改文檔相關性
某索引index_a有多個欄位, 要求實現如下的查詢:
1)針對欄位title,滿足'ssas'或者'sasa’。
2)針對欄位tags(數組欄位),如果tags欄位包含'pingpang',
則提升評分。
要求:寫出實現的DSL?
# 測試數據如下
put index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas","tags":"basketball"}
{"index":{"_id":2}}
{"title":"sasa","tags":"pingpang; football"}
# 解法1
POST index_a/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"title": "ssas"
}
},
{
"match": {
"title": "sasa"
}
}
]
}
}
],
"should": [
{
"match": {
"tags": {
"query": "pingpang",
"boost": 1
}
}
}
]
}
}
}
# 解法2
// https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
POST index_a/_search
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"match": {
"tags": {
"query": "pingpang"
}
}
},
"boost": 1
}
}
],
"must": [
{
"bool": {
"should": [
{
"match": {
"title": "ssas"
}
},
{
"match": {
"title": "sasa"
}
}
]
}
}
]
}
}
}
3.negative_boost降低相關性
對於某些結果不滿意,但又不想通過 must_not 排除掉,可以考慮可以考慮boosting query的negative_boost。
即:降低評分
negative_boost
(Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html
POST index_a/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"tags": "football"
}
},
"negative": {
"term": {
"tags": "pingpang"
}
},
"negative_boost": 0.5
}
}
}
4.function_score 自定義評分
如何同時根據 銷量和瀏覽人數進行相關度提升?
問題描述:針對商品,例如有想要有一個提升相關度的計算,同時針對銷量和瀏覽人數?
例如oldScore*(銷量+瀏覽人數)
**************************
商品 銷量 瀏覽人數
A 10 10
B 20 20
C 30 30
**************************
# 示例數據如下
put goods_index/_bulk
{"index":{"_id":1}}
{"name":"A","sales_count":10,"view_count":10}
{"index":{"_id":2}}
{"name":"B","sales_count":20,"view_count":20}
{"index":{"_id":3}}
{"name":"C","sales_count":30,"view_count":30}
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
知識點:script_score
POST goods_index/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": {
"source": "_score * (doc['sales_count'].value+doc['view_count'].value)"
}
}
}
}
}
十五.Search-del Bool複雜檢索
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-bool-query.html
基本語法
真題演練
寫一個查詢,要求某個關鍵字再文檔的四個欄位中至少包含兩個以上
功能點:bool 查詢,should / minimum_should_match
1.檢索的bool查詢
2.細節點 minimum_should_match
註意:minimum_should_match 當有其他子句的時候,預設值為0,當沒有其他子句的時候預設值為1
POST test_index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"filed1": "kr"
}
},
{
"match": {
"filed2": "kr"
}
},
{
"match": {
"filed3": "kr"
}
},
{
"match": {
"filed4": "kr"
}
}
],
"minimum_should_match": 2
}
}
}
十六.Search-Aggregations
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations.html
聚合分類
分桶聚合(bucket)
terms
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html
# 按照作者統計文檔數
POST bilili_elasticsearch/_search
{
"size": 0,
"aggs": {
"agg_user": {
"terms": {
"field": "user",
"size": 1
}
}
}
}
date_histogram
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html
# 按照up_time 按月進行統計
POST bilili_elasticsearch/_search
{
"size": 0,
"aggs": {
"agg_up_time": {
"date_histogram": {
"field": "up_time",
"calendar_interval": "month"
}
}
}
}
指標聚合 (metrics)
Max
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html
# 獲取up_time最大的
POST bilili_elasticsearch/_search
{
"size": 0,
"aggs": {
"agg_max_up_time": {
"max": {
"field": "up_time"
}
}
}
}
Top_hits
官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html
# 根據user聚合只取一個聚合結果,並且獲取命中數據的詳情前3條,並按照指定欄位排序
POST bilili_elasticsearch/_search
{
"size": 0,
"aggs": {
"terms_agg_user": {
"terms": {
"field": "user",
"size": 1
},
"aggs": {
"top_user_hits": {
"top_hits": {
"_source": {
"includes": [
"video_time",
"title",
"see",
"user",
"up_time"
]
},
"sort": [
{
"see":{
"order": "desc"
}
}
],
"size": 3
}
}
}
}
}
}
// 返回結果如下
{
"took" : 91,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms_agg_user" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 975,
"buckets" : [
{
"key" : "Elastic搜索",
"doc_count" : 25,
"top_user_hits" : {
"hits" : {
"total" : {
"value" : 25,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bilili_elasticsearch",
"_id" : "5ccCVoQBUyqsIDX6wIcm",
"_score" : null,
"_source" : {
"video_time" : "03:45",
"see" : "92",
"up_time" : "2021-03-19",
"title" : "Elastic 社區大會2021: 用加 Gatling 進行Elasticsearch的負載測試,寓教於樂。",
"user" : "Elastic搜索"
},
"sort" : [
"92"
]
},
{
"_index" : "bilili_elasticsearch",
"_id" : "8scCVoQBUyqsIDX6wIgn",
"_score" : null,
"_source" : {
"video_time" : "10:18",
"see" : "79",
"up_time" : "2020-10-20",
"title" : "為Elasticsearch啟動htpps訪問",
"user" : "Elastic搜索"
},
"sort" : [
"79"
]
},
{
"_index" : "bilili_elasticsearch",
"_id" : "7scCVoQBUyqsIDX6wIcm",
"_score" : null,
"_source" : {
"video_time" : "04:41",
"see" : "71",
"up_time" : "2021-03-19",
"title" : "Elastic 社區大會2021: Elasticsearch作為一個地理空間的資料庫",
"user" : "Elastic搜索"
},
"sort" : [
"71"
]
}
]
}
}
}
]
}
}
}
子聚合 (Pipeline)
Pipeline:基於聚合的聚合 官網文檔地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline.html
bucket_selector
# 根據order_date按月分組,並且求銷售總額大於1000
POST kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"date_his_aggs": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sum_aggs": {
"sum": {
"field": "total_unique_products"
}
},
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "sum_aggs"
},
"script": "params.totalSales > 1000"
}
}
}
}
}
}
真題演練
earthquakes索引中包含了過去30個月的地震信息,請通過一句查詢,獲取以下信息
l 過去30個月,每個月的平均 mag
l 過去30個月里,平均mag最高的一個月及其平均mag
l 搜索不能返回任何文檔
max_bucket 官網地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html
POST earthquakes/_search
{
"size": 0,
"query": {
"range": {
"time": {
"gte": "now-30M/d",
"lte": "now"
}
}
},
"aggs": {
"agg_time_his": {
"date_histogram": {
"field": "time",
"calendar_interval": "month"
},
"aggs": {
"avg_aggs": {
"avg": {
"field": "mag"
}
}
}
},
"max_mag_sales": {
"max_bucket": {
"buckets_path": "agg_time_his>avg_aggs"
}
}
}
}