ElasticSearch 用Scroll(對應資料庫的游標) 一次查出全部數據 ...
Elasticsearch 查詢結果預設只顯示10條,可以通過設置from及size來達到分頁的效果(詳見附3),但是 from + size <= 10,000,因為index.max_result_window 預設值是10,000,而 from+ size 必須小於index.max_result_window 。因此只能用Scroll(一次取一點,分多次取)取出所有的結果,
- Scroll相當於傳統資料庫的游標,具體代碼片段如下:
SearchResponse scrollResp = client.prepareSearch(availableIndices) .setTypes(type) .setScroll(new TimeValue(60000)) .setQuery(boolQueryBuilder) .setSize(SEARCH_HITS_SIZE).get(); //max of SEARCH_HITS_SIZE hits will be returned for each scroll //Scroll until no hits are returned do { for (SearchHit hit : scrollResp.getHits().getHits()) { tmpJsonList.add( (JSONObject) JSONValue.parse(hit.getSourceAsString())); } } jsonList.addAll(tmpJsonList); tmpJsonList.clear(); scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet(); } while (scrollResp.getHits().getHits().length != 0);
setScroll()里傳入的時間,表示一次處理setSize()中size大小的數據的超時時間,即處理一個分頁最長不超過的時間,上面的代碼表示TimeOut = 1分鐘(詳情可搜索Scroll context)。scrollResp.getScrollId()每次回生成一個ScrollID,如下圖:
- 用from + size迴圈讀取的代碼片段如下:
int index = 0; { tmpJsonList.clear(); srb.setFrom(Math.multiplyExact(index, SEARCH_HITS_SIZE)); index++; MultiSearchResponse.Item[] items = sr.get().getResponses(); for (MultiSearchResponse.Item item : items) { SearchResponse response = item.getResponse(); SearchHit[] hits = response.getHits().getHits(); if (hits.length != 0) { for (SearchHit hit : hits) { tmpJsonList.add((JSONObject) JSONValue.parse(hit.getSourceAsString()); } } } jsonList.addAll(tmpJsonList); } } while (tmpJsonList.size() > 0);
其中:SEARCH_HITS_SIZE = 1000, srb是多條件組合查詢,前置代碼如下:
queryBuilders.forEach(query -> { boolQueryBuilder.must(query); }); MultiSearchRequestBuilder sr = client.prepareMultiSearch(); SearchRequestBuilder srb = client.prepareSearch().setTypes(type).setIndices(availableIndices).setQuery(boolQueryBuilder).setSize(SEARCH_HITS_SIZE); sr.add(srb);
查詢條件的構造代碼片段如下(用QueryBuilders根據需要選擇term, range, match等):
StringUtil.isEmpty(l7p)) { queryBuilders.add(QueryBuilders.termQuery(Event.FIELD_L7P, l7p)); } if (!StringUtil.isEmpty(startTime) && StringUtil.isEmpty(endTime)) { queryBuilders.add(QueryBuilders.rangeQuery(Event.FIELD_START_TIME).from(startTime)); }
附:
1)using scroll in java https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
2)scroll https://www.elastic.co/guide/en/elasticsearch/reference/5.1/search-request-scroll.html
3) from and size https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size
*****************************************************************************************************
精力有限,想法太多,專註做好一件事就行
- 我只是一個程式猿。5年內把代碼寫好,技術博客字字推敲,堅持零拷貝和原創
- 寫博客的意義在於打磨文筆,訓練邏輯條理性,加深對知識的系統性理解;如果恰好又對別人有點幫助,那真是一件令人開心的事
*****************************************************************************************************