A sequence of

This article belongs to geek Time Elasticsearch core technology and actual combat learning notes series.

Ii. Modeling Suggestions

2.1 Modeling Suggestion (I) : How to deal with association relations

  1. Object: Denormailzation is preferred
  2. Nested: When data contains multiple numerical objects (a movie with multiple actors), there is a need to query at the same time
  3. Child/Parent: Associated document updates frequently (blogs and comments)
  • Kibana currently does not support nested types and parent/Child types, but it may support nested types in the future
  • If you need to use Kibana for data analysis, you still need to make trade-offs between nesting and parent-child association types during data modeling

2.2 Modeling Suggestion (2) : Avoid too many fields

It is best to avoid a large number of fields in a document

  • An excessive number of fields is not easy to maintain
  • The Mapping information is stored in the Cluster State. A large amount of data may affect the Cluster performance (The Cluster State information must be synchronized with all nodes).
  • Reindex is required to delete or modify data

The default maximum number of fields is 1000. You can set index.mapping.total_fields. Limit To limit the maximum number of fields.

2.3 the Dynamic v. Strict

Dynamic (in production, try not to turn Dynamic on) true – Unknown fields are automatically added false – New fields are not indexed, but are saved under _source strict – New fields are not indexed, Document write failures Strict can be controlled down to the field level

One example: Cookie Service data

Data from the Cookie Service

  • Cookies have many key-value pairs
  • When Dynamic is set to True
  • At the same time, the flat design will inevitably lead to the expansion of the number of fields

As data is written, new fields are continuously added to the Dynamic Mapping.

Solution: Nested Object & Key Value

DELETE cookie_service
PUT cookie_service
{
  "mappings": {
    "properties": {
      "cookies": {
        "type": "nested"."properties": {
          "name": {
            "type": "keyword"
          },
          "dateValue": {
            "type": "date"
          },
          "keywordValue": {
            "type": "keyword"
          },
          "IntValue": {
            "type": "integer"}}},"url": {
        "type": "text"."fields": {
          "keyword": {
            "type": "keyword"."ignore_above": 256}}}}}}Copy the code

There are different values:

## Write data using key and a value field of the appropriate type
PUT cookie_service/_doc/1
{
 "url":"www.google.com"."cookies":[
    {
      "name":"username"."keywordValue":"tom"
    },
    {
       "name":"age"."intValue":32

    }

   ]
 }


PUT cookie_service/_doc/2
{
 "url":"www.amazon.com"."cookies":[
    {
      "name":"login"."dateValue":"2019-01-01"
    },
    {
       "name":"email"."IntValue": 32}}]# Nested query is filtered by bool query
POST cookie_service/_search
{
  "query": {
    "nested": {
      "path": "cookies"."query": {
        "bool": {
          "filter": [{"term": {
              "cookies.name": "age"
            }},
            {
              "range": {"cookies.intValue": {"gte":30}}}]}}}}Copy the code

Some deficiencies in saving keys/values through Nested objects

You can reduce the number of fields to solve the problem of storing too much Meta information in Cluster State, but

  • The query statement complexity increases
  • Nested objects are not conducive to visual analysis in Kibana summary

2.3 Modeling Suggestion (3) : Avoid regular queries

Question:

  • Re, wildcard, and prefix queries are Term queries, but the performance is not good enough
  • In particular, starting with wildcards is a performance disaster

Case study:

  • A field in the document contains the ES version information, for example, version: “7.1.0”
  • Search for all versions that are bug fixes? What documents are associated with each major release number?

Searching for all versions whose middle is 1 is inefficient.

# Optimize, use inner Object
PUT softwares/
{
  "mappings": {
    "_meta": {
      "software_version_mapping": "1.1"
    },
    "properties": {
      "version": {
        "properties": {
          "display_name": {
            "type": "keyword"
          },
          "hot_fix": {
            "type": "byte"
          },
          "marjor": {
            "type": "byte"
          },
          "minor": {
            "type": "byte"
          }
        }
      }
    }
  }
}


Write multiple documents via Inner Object
PUT softwares/_doc/1
{
  "version": {"display_name":"7.1.0"."marjor": 7,"minor": 1,"hot_fix":0  
  }

}

PUT softwares/_doc/2
{
  "version": {"display_name":"7.2.0"."marjor": 7,"minor": 2."hot_fix":0  
  }
}

PUT softwares/_doc/3
{
  "version": {"display_name":"7.2.1"."marjor": 7,"minor": 2."hot_fix": 1}}Select * from bool;
POST softwares/_search
{
  "query": {
    "bool": {
      "filter": [{"match": {"version.marjor": 7}}, {"match": {"version.minor":2}}]}}}Copy the code

Es cache will be used to query, high efficiency.

2.4 Modeling Suggestion (IV) : Avoid inaccurate aggregation caused by vacancy

Insert two pieces of data: 1, 5, and null.

Looking at the average, it’s not right.

Set the mapping and assign null to 1.

PUT ratings
{
  "mappings": {
      "properties": {
        "rating": {
          "type": "float"."null_value": 1.0}}}}Copy the code

After dropping the index, write the two documents again. The query is correct.

2.5 Modeling Suggestion (5) : Add Meta information to the indexed Mapping

The Mappings setting is very important and needs to be considered from two dimensions

  • Features: index, aggregate, sort
  • Performance: storage overhead, memory overhead, search performance

Mappings setting is an iterative process

  • Easy to add new fields (update_by_query if necessary)
  • Update delete field not allowed (Reindex required to rebuild data)
  • Add Meta information to Mappings for better version management
  • You can upload the Mapping file to Git for management

* * * * * * * * * * * * * * * * * * * * * * * * *

The general guidelines are as follows: First meet functions (basic service requirements), then optimize storage () and performance ().

Or to understand the characteristics of specific databases, to make trade-offs.