現在我們可以開始探討ES的核心環節:搜索search了。search又分filter,query兩種模式。filter模式即篩選模式:將符合篩選條件的記錄作為結果找出來。query模式則分兩個步驟:先篩選,然後對每條符合條件記錄進行相似度計算。就是多了個評分過程。如果我們首先要實現傳統資料庫的查詢功 ...
現在我們可以開始探討ES的核心環節:搜索search了。search又分filter,query兩種模式。filter模式即篩選模式:將符合篩選條件的記錄作為結果找出來。query模式則分兩個步驟:先篩選,然後對每條符合條件記錄進行相似度計算。就是多了個評分過程。如果我們首先要實現傳統資料庫的查詢功能的話,那麼用filter模式就足夠了。filter模式同樣可以利用搜索引擎的分詞功能產生高質量的查詢結果,而且filter是可以進緩存的,執行起來效率更高。這些功能資料庫管理系統是無法達到的。ES的filter模式是在bool查詢框架下實現的,如下:
GET /_search
{
"query": {
"bool": {
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
下麵是一個最簡單的示範:
val filterTerm = search("bank")
.query(
boolQuery().filter(termQuery("city.keyword","Brogan")))
產生的請求json如下:
POST /bank/_search
{
"query":{
"bool":{
"filter":[
{
"term":{"city.keyword":{"value":"Brogan"}}
}
]
}
}
}
先說明一下這個查詢請求:這是一個詞條查詢termQuery,要求條件完全匹配,包括大小寫,肯定無法用經過分詞器分析過的欄位,所以用city.keyword。
返回查詢結果json:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "[email protected]",
"city" : "Brogan",
"state" : "IL"
}
}
]
}
}
我們來看看elasitic4s是怎樣表達上面json結果的:首先,返回的類型是 Reponse[SearchResponse]。Response類定義如下:
sealed trait Response[+U] {
def status: Int // the http status code of the response
def body: Option[String] // the http response body if the response included one
def headers: Map[String, String] // any http headers included in the response
def result: U // returns the marshalled response U or throws an exception
def error: ElasticError // returns the error or throw an exception
def isError: Boolean // returns true if this is an error response
final def isSuccess: Boolean = !isError // returns true if this is a success
def map[V](f: U => V): Response[V]
def flatMap[V](f: U => Response[V]): Response[V]
final def fold[V](ifError: => V)(f: U => V): V = if (isError) ifError else f(result)
final def fold[V](onError: RequestFailure => V, onSuccess: U => V): V = this match {
case failure: RequestFailure => onError(failure)
case RequestSuccess(_, _, _, result) => onSuccess(result)
}
final def foreach[V](f: U => V): Unit = if (!isError) f(result)
final def toOption: Option[U] = if (isError) None else Some(result)
}
Response[+U]是個高階類,如果把U替換成SearchResponse, 那麼返回的結果值可以用def result: SearchResponse來獲取。status代表標準HTTP返回狀態,isError,isSuccess代表執行情況,error是確切的異常消息。返回結果的頭部信息在headers內。我們再看看這個SearchResponse類的定義:
case class SearchResponse(took: Long,
@JsonProperty("timed_out") isTimedOut: Boolean,
@JsonProperty("terminated_early") isTerminatedEarly: Boolean,
private val suggest: Map[String, Seq[SuggestionResult]],
@JsonProperty("_shards") private val _shards: Shards,
@JsonProperty("_scroll_id") scrollId: Option[String],
@JsonProperty("aggregations") private val _aggregationsAsMap: Map[String, Any],
hits: SearchHits) {...}
case class SearchHits(total: Total,
@JsonProperty("max_score") maxScore: Double,
hits: Array[SearchHit]) {
def size: Long = hits.length
def isEmpty: Boolean = hits.isEmpty
def nonEmpty: Boolean = hits.nonEmpty
}
case class SearchHit(@JsonProperty("_id") id: String,
@JsonProperty("_index") index: String,
@JsonProperty("_type") `type`: String,
@JsonProperty("_version") version: Long,
@JsonProperty("_seq_no") seqNo: Long,
@JsonProperty("_primary_term") primaryTerm: Long,
@JsonProperty("_score") score: Float,
@JsonProperty("_parent") parent: Option[String],
@JsonProperty("_shard") shard: Option[String],
@JsonProperty("_node") node: Option[String],
@JsonProperty("_routing") routing: Option[String],
@JsonProperty("_explanation") explanation: Option[Explanation],
@JsonProperty("sort") sort: Option[Seq[AnyRef]],
private val _source: Map[String, AnyRef],
fields: Map[String, AnyRef],
@JsonProperty("highlight") private val _highlight: Option[Map[String, Seq[String]]],
private val inner_hits: Map[String, Map[String, Any]],
@JsonProperty("matched_queries") matchedQueries: Option[Set[String]])
extends Hit {...}
返回結果的重要部分如 _score, _source,fields都在SearchHit里。完整的返回結果處理示範如下:
val filterTerm = client.execute(search("bank")
.query(
boolQuery().filter(termQuery("city.keyword","Brogan")))).await
if (filterTerm.isSuccess) {
if (filterTerm.result.nonEmpty)
filterTerm.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
} else println(s"Error: ${filterTerm.error.reason}")
傳統查詢方式中首碼查詢用的比較多:
POST /bank/_search
{
"query":{
"bool":{
"filter":[
{
"prefix":{"city.keyword":{"value":"Bro"}}
}
]
}
}
}
val filterPrifix = client.execute(search("bank")
.query(
boolQuery().filter(prefixQuery("city.keyword","Bro")))
.sourceInclude("address","city","state")
).await
if (filterPrifix.isSuccess) {
if (filterPrifix.result.nonEmpty)
filterPrifix.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
} else println(s"Error: ${filterPrifix.error.reason}")
....
Map(address -> 880 Holmes Lane, city -> Brogan, state -> IL)
Map(address -> 810 Nostrand Avenue, city -> Brooktrails, state -> GA)
Map(address -> 295 Whitty Lane, city -> Broadlands, state -> VT)
Map(address -> 511 Heath Place, city -> Brookfield, state -> OK)
Map(address -> 918 Bridge Street, city -> Brownlee, state -> HI)
Map(address -> 806 Pierrepont Place, city -> Brownsville, state -> MI)
正則表達式查詢也有:
POST /bank/_search
{
"query":{
"bool":{
"filter":[
{
"regexp":{"address.keyword":{"value":".*bridge.*"}}
}
]
}
}
}
val filterRegex = client.execute(search("bank")
.query(
boolQuery().filter(regexQuery("address.keyword",".*bridge.*")))
.sourceInclude("address","city","state")
).await
if (filterRegex.isSuccess) {
if (filterRegex.result.nonEmpty)
filterRegex.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
} else println(s"Error: ${filterRegex.error.reason}")
....
Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS)
Map(address -> 721 Cambridge Place, city -> Efland, state -> ID)
當然,ES用bool查詢來實現複合式查詢,我們可以把一個bool查詢放進filter框架,如下:
POST /bank/_search
{
"query":{
"bool":{
"filter":[
{
"regexp":{"address.keyword":{"value":".*bridge.*"}}
},
{
"bool": {
"must": [
{ "match" : {"lastname" : "lane"}}
]
}
}
]
}
}
}
elastic4s QueryDSL 語句和返回結果如下:
val filterBool = client.execute(search("bank")
.query(
boolQuery().filter(regexQuery("address.keyword",".*bridge.*"),
boolQuery().must(matchQuery("lastname","lane"))))
.sourceInclude("lastname","address","city","state")
).await
if (filterBool.isSuccess) {
if (filterBool.result.nonEmpty)
filterBool.result.hits.hits.foreach {hit => println(s"score: ${hit.score}, ${hit.sourceAsMap}")}
} else println(s"Error: ${filterBool.error.reason}")
...
score: 0.0, Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS, lastname -> Lane)
score: 0.0 ,說明filter不會進行評分。可能執行效率會有所提高吧。