前言 前面我已經搭建好了ElasticSearch服務,並完成了MySQL到ElasticSearch的數據遷移; 使用ES專門做搜索功能,打破MySQL搜索瓶頸; ElasticSearch的應用場景 資料庫欄位太多,查詢太慢,索引沒有辦法再做優化; 資料庫1個count查詢就拖死全表; MySQ ...
前言
前面我已經搭建好了ElasticSearch服務,並完成了MySQL到ElasticSearch的數據遷移;
使用ES專門做搜索功能,打破MySQL搜索瓶頸;
ElasticSearch的應用場景
- 資料庫欄位太多,查詢太慢,索引沒有辦法再做優化;
- 資料庫1個count查詢就拖死全表;
- MySQL的limit翻到幾十幾百萬頁後實在是太慢;
- 資料庫like實在太慢,每次like整個伺服器cpu記憶體飆高,拖慢整個線上服務;
- 想要對外/內提供db里的數據的全文檢索服務;
- 提供日誌(程式運行)查詢功能;
本文將使用ElasticSearch的倒排索引取代MySQL的索引,進行大數據查詢,提升查詢效率;
一、精確查詢(termQuery)
term精確查詢並不會對查詢條件進行分詞,類似於MySQL中 select * from table where 欄位='xx值';
GET hotel/_search
{
"query": {
"term": {
"brand": {
"value": "萬豪"
}
}
},
"from": 0,
"size": 20
}
//按照品牌精確查詢 @Override public Map<String, Object> brandTermQuery(int current, int size, Map<String, Object> searchParam) { //按品牌精確查詢實現 //1.獲取前端參數 String brand = (String) searchParam.get("brand"); //響應前端的Map Map<String, Object> resultMap = new HashMap<>(); //2.構建查詢條件 //查詢請求 SearchRequest hotelSearchRequest = new SearchRequest("hotel"); //請求體 SearchSourceBuilder hotelSearchSourceBuilder = new SearchSourceBuilder(); //如果查詢條件為空就查詢所有 if (StringUtils.hasText(brand)) { //請求體-查詢部分 TermQueryBuilder hotelTermQueryBuilder = QueryBuilders.termQuery("brand", brand); hotelSearchSourceBuilder.query(hotelTermQueryBuilder); } //請求體-分頁部分 hotelSearchSourceBuilder.from((current - 1) * size); hotelSearchSourceBuilder.size(size); //查詢請求-封裝請求體 hotelSearchRequest.source(hotelSearchSourceBuilder); //3.去查詢 try { SearchResponse hotelSearchResponse = restHighLevelClient.search(hotelSearchRequest, RequestOptions.DEFAULT); //4.處理查詢結果集 SearchHits hotelSearchResponseHits = hotelSearchResponse.getHits(); //獲取命中總條目 Long totalHotelHits = hotelSearchResponseHits.getTotalHits().value; //獲取命中的每1個條 SearchHit[] hoteHits = hotelSearchResponseHits.getHits(); //前端 ArrayList<HotelEntity> hotelEntitieList = new ArrayList<>(); if (hoteHits != null || hoteHits.length > 0) { for (SearchHit hoteHit : hoteHits) { String sourceAsString = hoteHit.getSourceAsString(); //字元串轉換成Java對象 HotelEntity hotelEntity = JSON.parseObject(sourceAsString, HotelEntity.class); hotelEntitieList.add(hotelEntity); } } //前端展示 resultMap.put("list", hotelEntitieList); resultMap.put("totalResultSize", totalHotelHits); //設置分頁相關 resultMap.put("current", current); resultMap.put("totalPage", (totalHotelHits + size - 1) / size); } catch (IOException e) { throw new RuntimeException(e); } return resultMap; }HotelServiceImpl.java
二、中文分詞器
如果設置了欄位的type為keyword,就可以對該欄位使用term精確查詢;
如果設置了欄位的type為text,
當用戶進行term查詢時,ES會將當前查詢條件當做1個term(詞條),和當前倒排索引中term(詞條)進行匹配?
匹配成功則會查詢到數據,如果倒排索引中不存在該term(詞條)則查詢不到數據。
那我們如何對text類型的欄位進行term查詢呢?
這就需要利用中文分詞器對文檔中的內容進行中文分詞, 重構ES的倒排索引的結構,把整個文檔分詞成為若幹中文term(詞條)
1.ElasticSearch內置分詞器
在ElasticSearch預設內置了多種分詞器:
- Simple Analyzer - 按照非字母切分(符號被過濾)
- Whitespace Analyzer - 按照空格切分,不轉小寫
- Keyword Analyzer - 不分詞,直接將輸入當作輸出
2.預設分詞無法對中文分詞
看看ES是預設使用Standard Analyzer分詞器對文檔內容進行分詞;
GET _analyze
{
"text": "北京市東城區萬豪酒店"
}
3.
#因為啟動es時候 已經做好的目錄掛載 容器內部:/usr/share/elasticsearch/plugins 宿主機:/mydata/elasticsearch/plugins 所以只需要將文件複製到/mydata/elasticsearch/plugins 目錄下即可
docker restart elasticsearch
3.3.測試
GET /_analyze { "analyzer": "ik_max_word", "text": "北京市東城區萬豪酒店" }
IK分詞器有兩種分詞模式it_max_word
和ik_smart
模式
4.3.修改IK分詞器的配置文件
vim IKAnalyzer.cfg.xml #修改配置文件 註意這個地方 不要把搞亂碼了!!! <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment> <!--用戶可以在這裡配置自己的擴展字典 --> <entry key="ext_dict">my.dic</entry> <!--用戶可以在這裡配置自己的擴展停止詞字典--> <entry key="ext_stopwords">extra_stopword.dic</entry> <!--用戶可以在這裡配置遠程擴展字典 --> <entry key="remote_ext_dict">http://106.75.109.43:28888/remote.dic</entry> <!--用戶可以在這裡配置遠程擴展停止詞字典--> <!-- <entry key="remote_ext_stopwords">http://ip地址:埠號/詞典文件</entry> --> </properties>
PUT hotel_2 { "mappings": { "properties": { "name":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "address":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "brand":{ "type": "keyword" }, "type":{ "type": "keyword" }, "price":{ "type": "integer" }, "specs":{ "type": "keyword" }, "salesVolume":{ "type": "integer" }, "area":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "imageUrl":{ "type": "text" }, "synopsis":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "createTime":{ "type": "date", "format": "yyyy-MM-dd" }, "isAd":{ "type":"integer" } } } } #重建索引 非同步構建和平滑構建 POST _reindex?wait_for_completion=false&requests_per_second=2000 { "source": { "index": "原始索引名字" }, "dest": { "index": "目標索引名字" } } #查看任務完成情況 GET _tasks/任務id #重建別名關聯關係 #斷開原來的關係 POST _aliases { "actions": [ { "remove": { "index": "hotel_1", "alias": "hotel" } } ] } #刪除原來的索引表 DELETE hotel_1 #新建hotel_2的關係 POST _aliases { "actions": [ { "add": { "index": "hotel_2", "alias": "hotel" } } ] }dms
4.5.測試文檔是否被分詞
此時文檔在存儲時已經被中文分詞器進行了中文分詞並存儲,我們就可以使用termQuery精確查詢進行分詞結果測試了;
由於termQuery精確查詢,不會對查詢條件進行分詞,所依我根據分詞結果進行查詢,如果分詞成功,就會查詢到text欄位的結果;
三、分詞查詢(mathQuery)
上述的term精確查詢必須要根據分詞之後的結果進行精確查詢;
可是用戶不知道你的文檔是怎麼分詞的,所以我們需要對用戶的查詢條件也進行分詞;
1.Kibana分詞查詢
GET hotel/_search
{
"query": {
"match": {
"name":"北京市東城區瑞麟灣"
}
}
}
matchQuery會對查詢條件進行分詞,並拿分詞後的結果,去ES中進行逐一匹配,預設取結果並集。
2.JavaAPI分詞查詢
//根據酒店名稱匹配查詢 @Override public Map<String, Object> nameMatchQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); Map<String, Object> map = new HashMap<>(); //獲取name參數 String name = (String) searchParam.get("name"); if (StringUtils.hasText(name)) { //組裝查詢對象 MatchQueryBuilder nameMatchQueryBuilder = QueryBuilders.matchQuery("name", name); searchSourceBuilder.query(nameMatchQueryBuilder); } //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); //處理查詢結果 try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); } catch (IOException e) { throw new RuntimeException(e); } return map; }HotelServiceImpl.java
四、
1.Kibana分詞查詢
GET hotel/_search
{
"query": {
"wildcard": {
"brand": {
"value": "美*"
}
}
}
}
2.JavaAPI分詞查詢
//根據酒店品牌模糊查詢 @Override public Map<String, Object> nameWildcardQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //根據酒店名稱模糊查詢 //1.獲取前端參數 String name = (String) searchParam.get("name"); //2.組裝查詢對象 if (StringUtils.hasText(name)) { WildcardQueryBuilder brandWildcardQuery = QueryBuilders.wildcardQuery("brand", name+"*"); searchSourceBuilder.query(brandWildcardQuery); } //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); Map<String, Object> map = new HashMap<>(); try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); } catch (IOException e) { throw new RuntimeException(e); } return map; }HotelServiceImpl.java
query_string
GET hotel/_search
{
"query": {
"query_string": {
"fields": ["name","brand","address","synopsis"],
"query": "萬豪 OR 北京 OR 上海"
}
}
}
//根據name,synopsis,area,address進行多域(欄位)查詢 @Override public Map<String, Object> searchQueryStringQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); Map<String, Object> map = new HashMap<>(); //根據name,synopsis,area,address進行多域查詢 String condition = (String) searchParam.get("condition"); //組裝查詢對象 if (StringUtils.hasText(condition)) { QueryStringQueryBuilder queryStringQueryBuilder = QueryBuilders.queryStringQuery(condition) .field("name") .field("address") .field("synopsis") .field("area") .defaultOperator(Operator.OR); searchSourceBuilder.query(queryStringQueryBuilder); } //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); } catch (IOException e) { throw new RuntimeException(e); } return map; }HotelServiceImpl
當用戶進行搜索時,有時會關註該商品的銷量、評論數等信息,對某些域(欄位)進行進行排序,搜索出銷量最高或評論數最多的商品。
使用match_all
price和salesVolume域(欄位)進行排序;#排序查詢:支持多欄位排序 GET hotel/_search { "query": { "match_all": {} }, "sort": [ { "price": { "order": "desc" } }, { "salesVolume": { "order": "asc" } } ] }
//根據銷量排序查詢 @Override public Map<String, Object> salesSortQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //設置按銷量排序 String sortWay = (String) searchParam.get("sortWay"); //組裝sort部分 if (StringUtils.hasText(sortWay)) { if ("asc".equals(sortWay)) { searchSourceBuilder.sort("price", SortOrder.ASC).sort("salesVolume", SortOrder.ASC); } else { searchSourceBuilder.sort("price", SortOrder.DESC).sort("salesVolume", SortOrder.DESC); } } //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); Map<String, Object> map = new HashMap<>(); try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); map.put("sortWay", searchParam.get("sortWay")); } catch (IOException e) { throw new RuntimeException(e); } return map; }HotelServiceImpl.java
七、範圍查詢(range)
當用戶要搜索商品時,有時會對某一個特定的價格區間進行查詢,搜索出符合心理預期價格的商品;
區間範圍查詢:查詢價格在100-500元範圍之間的商品;
GET hotel/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 500
}
}
}
}
//處理前端參數 public Long transferToLong(Object param){ if(param==null || "".equals(param)){ return null; }else{ return Long.parseLong((String)param); } } //根據價格範圍查詢 @Override public Map<String, Object> priceRangeQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //根據價格範圍查詢 //1.獲取參數 Long maxPrice = transferToLong(searchParam.get("maxPrice")); Long minPrice = transferToLong(searchParam.get("minPrice")); //2.組裝搜索條件 if (maxPrice != null || maxPrice != null) { RangeQueryBuilder priceRangeQueryBuilder = QueryBuilders.rangeQuery("price"); if (maxPrice != null) { priceRangeQueryBuilder.lte(maxPrice); } if (minPrice != null) { priceRangeQueryBuilder.gte(maxPrice); } } //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); //處理查詢結果 try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } Map<String, Object> map = new HashMap<>(); map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); map.put("minPrice", searchParam.get("minPrice")); map.put("maxPrice", searchParam.get("maxPrice")); return map; } catch (IOException e) { e.printStackTrace(); } return null; }HotelServiceImpl.java
select * from table where 欄位1='條件1' or 欄位2='條件2' ;
連接方式有
- must(and): 條件必須成立
- must_not(not): 條件必須不成立
- should(or): 1個條件可以成立即可
- filter
查詢北京市的萬豪價格區間在500-2000,最好是五星級;
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"brand": {
"value": "萬豪"
}
}
},
{
"term": {
"area": {
"value": "北京市"
}
}
}
],
"should": [
{
"term": {
"specs": {
"value": "五星級"
}
}
}
],
"filter": [
{
"range": {
"price": {
"gte": 500,
"lte": 2000
}
}
}
]
}
}
}
//多條件查詢 //搜索框多域、品牌精確、城市精確、星級精確、價格範圍、銷量排序 @Override public Map<String, Object> searchBoolQuery(Integer current, Integer size, Map<String, Object> searchParam) { //設置查詢 SearchRequest searchRequest = new SearchRequest("hotel"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //todo 多條件查詢 :多域、品牌精確、城市精確、星級精確、價格範圍、銷量排序 //設置查詢方式 BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); //多域 if (StringUtils.hasText(searchParam.get("condition"))) { QueryBuilder queryBuilder = QueryBuilders.queryStringQuery(searchParam.get("condition").toString()) .field("name") .field("synopsis") .field("area") .field("address") .defaultOperator(Operator.OR); boolQueryBuilder.must(queryBuilder); } //品牌精確 if (StringUtils.hasText(searchParam.get("brand"))) { TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("brand", searchParam.get("brand")); boolQueryBuilder.filter(termQueryBuilder); } //城市精確 if (StringUtils.hasText(searchParam.get("area"))) { TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("area", searchParam.get("area")); boolQueryBuilder.filter(termQueryBuilder); } //星級精確 if (StringUtils.hasText(searchParam.get("specs"))) { TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("specs", searchParam.get("specs")); boolQueryBuilder.filter(termQueryBuilder); } //價格範圍 //1.獲取參數 Long maxPrice = transferToLong(searchParam.get("maxPrice")); Long minPrice = transferToLong(searchParam.get("minPrice")); //2.根據情況組裝條件 if (maxPrice!=null||minPrice!=null){ RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("price"); //gte great than and equal 大於等於 //lte less than and equal 小於等於 if (maxPrice!=null){ rangeQueryBuilder.lte(maxPrice); } if (minPrice!=null){ rangeQueryBuilder.gte(minPrice); } boolQueryBuilder.must(rangeQueryBuilder); } //銷量排序 if (StringUtils.hasText(searchParam.get("sortWay"))) { if ("desc".equalsIgnoreCase(searchParam.get("sortWay").toString())) { searchSourceBuilder.sort("salesVolume", SortOrder.DESC); } else { searchSourceBuilder.sort("salesVolume", SortOrder.ASC); } } searchSourceBuilder.query(boolQueryBuilder); //設置分頁 searchSourceBuilder.from((current - 1) * size); searchSourceBuilder.size(size); searchRequest.source(searchSourceBuilder); //處理查詢結果 try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits(); long totalHits = hits.getTotalHits().value; SearchHit[] searchHits = hits.getHits(); List<HotelEntity> list = new ArrayList<>(); for (SearchHit searchHit : searchHits) { String sourceAsString = searchHit.getSourceAsString(); list.add(JSON.parseObject(sourceAsString, HotelEntity.class)); } Map<String, Object> map = new HashMap<>(); map.put("list", list); map.put("totalResultSize", totalHits); map.put("current", current); //設置總頁數 map.put("totalPage", (totalHits + size - 1) / size); map.put("brand", searchParam.get("brand")); map.put("area", searchParam.get("area")); map.put("specs", searchParam.get("specs"