ElasticSearch 嵌套映射和過濾器及查詢

来源:http://www.cnblogs.com/zhc-hnust/archive/2016/04/28/5441179.html
-Advertisement-
Play Games

ElasticSearch - 嵌套映射和過濾器 Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the ne ...


ElasticSearch - 嵌套映射和過濾器

Because nested objects are indexed as separate hidden documents, we can’t query them directly. Instead, we have to use the nested query to access them:

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }}, 
        {
          "nested": {
            "path": "comments", 
            "query": {
              "bool": {
                "must": [ 
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}


①The title clause operates on the root document.
②The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document.
 The comments.name and comments.age clauses operate on the same nested  document
nested field can contain other nested fields. Similarly, a nested query can contain othernested queries. The nesting hierarchy is applied as you would expect.

Of course, a nested query could match several nested documents. Each matching nested document would have its own relevance score, but these multiple scores need to be reduced to a single score that can be applied to the root document.

By default, it averages the scores of the matching nested documents. This can be controlled by setting thescore_mode parameter to avgmaxsum, or even none (in which case the root document gets a constant score of 1.0).

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "eggs" }},
        {
          "nested": {
            "path":       "comments",
            "score_mode": "max", 
            "query": {
              "bool": {
                "must": [
                  { "match": { "comments.name": "john" }},
                  { "match": { "comments.age":  28     }}
                ]
        }}}}
      ]
}}}
①Give the root document the _score from the best-matching nested document.

If placed inside the filter clause of a Boolean query, a nested query behaves much like anested query, except that it doesn’t accept the score_mode parameter. Because it is being used as a non-scoring query — it includes or excludes, but doesn’t score —  a score_modedoesn’t make sense since there is nothing to score.

 

curl -XPOST "http://localhost:9200/index-1/movie/" -d'
{
   "title": "The Matrix",
   "cast": [
      {
         "firstName": "Keanu",
         "lastName": "Reeves"
      },
      {
         "firstName": "Laurence",
         "lastName": "Fishburne"
      }
   ]
}'

Given many such movies in our index we can find all movies with an actor named "Keanu" using a search request such as:

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name "Keanu" and last name "Reeves":

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "reeves"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Or at least so it seems. However, let's see what happens if we search for movies with an actor with "Keanu" as first name and "Fishburne" as last name.

curl -XPOST "http://localhost:9200/index-1/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "term": {
                        "cast.firstName": "keanu"
                     }
                  },
                  {
                     "term": {
                        "cast.lastName": "fishburne"
                     }
                  }
               ]
            }
         }
      }
   }
}'

Clearly this should, at first glance, not match The Matrix as there's no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an author with "Keanu" as first name and (albeit a different) actor with "Fishburne" as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn't be able to handle that requirement.

Nested mapping and filter to the rescue

Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as nested. To try this out, let's create ourselves a new index with the "actors" field mapped as nested.

curl -XPUT "http://localhost:9200/index-2" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested"
            }
         }
      }
   }
}'

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here's how we would search for movies starring an actor named "Keanu Fishburne":

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "bool": {
                     "must": [
                        {
                           "term": {
                              "firstName": "keanu"
                           }
                        },
                        {
                           "term": {
                              "lastName": "fishburne"
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   }
}'

As you can see we've wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn't return The Matrix while modifying it to instead match "Reeves" as last name will make it match The Matrix. However, there's one caveat.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won't get any hits.

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "cast.firstName": "keanu"
            }
         }
      }
   }
}'

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document.

Obviously we can still search for movies based only on first names amongst the cast, by using nested filters though. Like this:

curl -XPOST "http://localhost:9200/index-2/movie/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "nested": {
               "path": "cast",
               "filter": {
                  "term": {
                     "firstName": "keanu"
                  }
               }
            }
         }
      }
   }
}'

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn't mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using theinclude_in_parent property, like this:

curl -XPUT "http://localhost:9200/index-3" -d'
{
   "mappings": {
      "movie": {
         "properties": {
            "cast": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'

In an index such as the one created with the above request we'll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters while still being able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for "Keanu Fishburne" will match The Matrix using a regular bool filter while it won't when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn't if we forget to use nested filters.

PS. For updates about new posts, sites I find useful and the occasional rant you can follow me on Twitter. You are also most welcome to subscribe to the RSS-feed.

Array Type

Read the doc on elasticsearch.org

As its name suggests, it can be an array of native types (string, int, …) but also an array of objects (the basis used for “objects” and “nested”).

Here are some valid indexing examples :

{
    "Article" : [
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Francois",
                "surname": "francoisg",
                "id" : 18
            },
            {
                "firstname" : "Gregory",
                "surname" : "gregquat"
                "id" : "2"
            }
        ]
      }
    },
    {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [
            {
                "firstname" : "Gregory",
                "surname" : "gregquat",
                "id" : "2"
            }
        ]
      }
}

 

You can find different Array :

  • Categories : array of integers
  • Tags : array of strings
  • author : array of objects (inner objects or nested)

We explicitely specify this “simple” type as it can be more easy/maintainable to store a flatten value rather than the complete object.
Using a non relational structure should make you think about a specific model for your search engine :

  • To filter : If you just want to filter/search/aggregate on the textual value of an object, then flatten the value in the parent object.
  • To get the list of objects that are linked to a parent (and if you do not need to filter or index these objects), just store the list of ids and hydrate them with Doctrine and Symfony (in French for the moment).

Inner objects

The inner objects are just the JSON object association in a parent. For example, the “authors” in the above example. The mapping for this example could be :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : object
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

You can Filter or Query on these “inner objects”. For example :

query: author.firstname=Francois will return the post with the id 12 (and not the one with the id 13).

You can read more on the Elasticsearch website

Inner objects are easy to configure. As Elasticsearch documents are “schema less”, you can index them without specify any mapping.

The limitation of this method lies in the manner as ElasticSearch stores your data. Reusing the above example, here is the internal representation of our objects :

 

[
      {
        "id" : 12
        "title" : An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Francois","Gregory"],
        "author.surname" : ["Francoisg","gregquat"],
        "author.id" : [18,2]
      }
      {
        "id" : 13
        "title" : "A second article",
        "categories" : [1,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author.firstname" : ["Gregory"],
        "author.surname" : ["gregquat"],
        "author.id" : [2]
      }
]

 

The consequence is that the query :

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "firstname": "francois",
          "surname": "gregquat"
        }
      }
    }
  }
}


author.firstname=Francois AND surname=gregquat will return the document “12″. In the case of an inner object, this query can by translated as “Who has at least one author.surname = gregquat and one author.firstname=francois”.

 

To fix this problem, you must use the nested.

Les nested

First important difference : nested must be specified in your mapping.

The mapping looks like an object one, only the type changes :

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                article :
                    mappings:
                        title : ~
                        categories : ~
                        tag : ~
                        author : 
                            type : nested
                            properties : 
                                firstname : ~
                                surname : ~
                                id : 
                                    type : integer

 

This time, the internal representation will be :

 

[
      {
        "id" : 12
        "title" : "An article title",
        "categories" : [1,3,5,7],
        "tag" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Francois",
            "surname" : "Francoisg",
            "id" : 18
        },
        {
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      },
      {
        "id" : 13
        "title" : "A second article title",
        "categories" : [1,7],
        "tags" : ["elasticsearch", "symfony",'Obtao'],
        "author" : [{
            "firstname" : "Gregory",
            "surname" : "gregquat",
            "id" : 2
        }]
      }
]

 

This time, we keep the object structure.

Nested have their own filters which allows to filter by nested object. If we go on with our example (with the limitation of inner objects), we can write this query :

 

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "nested" : {
          "path" : "author",
          "filter": {
            "bool": {
              "must": [
                {
                  "term" : {
                    "author.firsname": "francois"
                  }
                },
                {
                  "term" : {
                    "author.surname": "gregquat"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}


hi
We can translate it as “Who has an author object whose surname is equal to ‘gregquat’ and whose firstname is ‘francois’”. This query will return no result.

 

There is still a problem which is penalizing when working with bug objects : when you want to change a single value of the nester, you have to reindex the whole parent document (including the nested).
If the objects are heavy, and often updated, the impact on performances can be important.

To fix this problem, you can use the parent/child associations.

Parent/Child

Parent/child associations are very similar to OneToMany relationships (one parent, several children).
The relationship remains hierarchical : an object type is only associated to one parent, and it’s impossible to create a ManyToMany relationship.

We are going to link our article to a category :

 

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        blog :
            types:
                category : 
                    mappings : 
                        id : ~
                        name : ~
                        description : ~
                article :
                    mappings:
                        title : ~
                        tag : ~
                        author : ~
                    _routing:
                        required: true
                        path: category
                    _parent:
                        type : "category"
                        identifier: "id" #optional as id is the default value
                        property : "category" #optional as the default value is the type value

 

When indexing an article, a reference to the Category will also be indexed (category.id).
So, we can index separately categories and article while keeping the references between them.

Like for nested, there are Filters and Queries that allow to search on parents or children :

  • Has Parent Filter / Has Parent Query : Filter/query on parent fields, returns children objects. In our case, we could filter articles whose parent category contains “symfony” in his description.
  • Has Child Filter / Has Child Query : Filter/query on child fields, returns the parent object. In our case, we could filter Categories for which “francoisg” has written an article.

 

{
  "query": {
    "has_child": {
      "type": "article",
      "query" : {
        "filtered": {
          "query": { "match_all": {}},
          "filter" : {
              "term": {"tag": "symfony"}
          }
        }
      }
    }
  }
}


This query will return the Categories that have at least one article tagged with “symfony”.

 

The queries are here written in JSON, but are easily transformable into PHP with the Elastica library.

 


您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • 當我們使用activity加fragment的時候,每個界面都要建立一個fragment,每個fragment裡面都要重寫onCreate(),onCreateView(),onActivityCreated(),方法,我們新建一個基類BaseFragment來重寫這些方法 BaseFragment ...
  • 最近開發一個項目需要用到Client(Android)透過Socket與Server通訊,網上有看到Apache封裝好的Socket通訊包,初步學習。 內容主要來源於(MINA官方教程(中文版)) 1.網路應用架構: 基於ApacheMINA的網路應用有三個層次,分別是I/O服務、I/O過濾器和I/ ...
  • Super @interface Super : NSObject @end @implementation Super (void)log{ NSLog(@"super"); } @end Child @interface Child : Super @end @implementation Ch ...
  • Service基礎使用(一) 之前的文章一直介紹Activity的使用,很多知識和用法單一的配合Activity使用,這次將總結Android四大組件之二—— . 本文將要介紹以下內容: 1. Service是什麼 2. 兩種Service啟動 3. Service 前臺服務與Notificatio ...
  • 目標效果 因為系統給我們提供的 UICollectionViewFlowLayout 佈局類不能實現瀑布流的效果,如果我們想實現 瀑布流 的效果,需要自定義一個 UICollectionViewLayout 類,實現瀑布流效果。效果如右圖。 依賴工具: 我們需要一個圖片大小和圖片地址的Josn數據, ...
  • 一、什麼是集合視圖 在iOS6.0之後,蘋果推出了一個新的繼承於UIScrollView的一個視圖,UICollectionView,也被稱之為集合視圖。 圖例: 二、創建UICollectionView 1、UICollectionView跟tableView實現的不同:UICollectionV ...
  • Android的SQL資料庫:安卓提供了輕量級的SQLite資料庫引擎,該引擎可在應用程式中實現持續存儲。這些庫不是獨立的應用程式,它們只是提供高級程式調用。 ...
  • 一、UIApplication 1、簡單介紹 (1)UIApplication對象是應用程式的象徵,一個UIApplication對象就代表一個應用程式。 (2)每一個應用都有自己的UIApplication對象,而且是單例的,如果試圖在程式中新建一個UIApplication對象,那麼將報錯提示。 ...
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...