preface

SQL > select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; A good field type design can take advantage of the search analysis feature of Elasticsearch.

mapping

If you want to use Elasticsearch well, you need to know what Mapping is. In a word: Mapping is the process of defining how documents and the fields they contain are stored and indexed.

What does Mapping do

In Elasticsearch, mapping is similar to the traditional relational database table structure definition, which does the following things:

  • Define field names and field types.
  • Defines configurations related to inverted indexes, such as whether they are indexed, whether they can be segmented, and so on.

There are two types of mapping: Dynamic mapping and Explicit mapping.

Dynamic mapping

Dynamic mapping indicates Dynamic mapping. A dynamic mapping can be created dynamically by inserting data into an index without defining the mapping. After inserting the index data, Elasticsearch automatically calculates the data type and creates the mapping.

Insert index index_001 into index index_001

PUT index_001/_doc/1
{
  "name":"lonely wolf"."age": 18."create_date":"The 2021-05-19 20:45:11"."update_date":"2021-05-23"
}
Copy the code

After inserting the data, execute GET index_001 to query the index information:

The age field is defined as long by Elasticsearch, update_date is defined as data, and the other two fields are presumed to be text.

You can use the Dynamic parameter to configure automatic mapping rules for Elasticsearch. There are four dynamic types:

dynamic=true

The default value. When set to true, the mapping is updated synchronously whenever a new field is inserted into the document.

Insert a new document in the above document with a new address field:

PUT index_001/_doc/2
{
  "name":"lonely wolf2"."age": 20."create_date":"The 2021-05-23 11:37:11"."update_date":"2021-05-23"."address":"Shenzhen, Guangdong"
}
Copy the code

Then look at the mapping, you can see that the mapping has a new address field, the mapping field is updated to mean that the field is added to the index:

dynamic=runtime

This type and true type is very similar, but there is a very big difference is that, while adding new fields will also update the mapping, will not be indexed, but new field that is not making index, but although not be indexed, but new field can still be queried, just the cost of query will be bigger. Therefore, this type is not recommended for frequently queried conditional fields, and is more suitable for logging indexes with uncertain data structures.

Change the dynamic type:

PUT index_001/_mapping
{
  "dynamic":  "runtime"
}
Copy the code

Add a new document and add a new field:

PUT index_001/_doc/3
{
  "email":"[email protected]"
}
Copy the code

Finally, query the mapping and see that the field attribute is Runtime and the type is keyword:

The following table shows the Elasticsearch mapping when the mapping is automatically created:

Insert data type dynamic=true dynamic=runtime
null No fields will be added No fields will be added
True or false boolean boolean
double float double
integer long long
object object object
String (verified by date) date date
String (numreic verified) Float or long Double or long
String (does not pass date or numreic check) Text, and a keyword subfield is also created keyword
array Depends on the first non-NULL value in the array Depends on the first non-NULL value in the array

PS: keyword: does not participate in the word segmentation.

dynamic=false

If the value is set to false, the new field will not be updated to the mapping, that is, the new field will not be indexed. Therefore, the new field cannot be searched (this should be different from the Runtime type). But that field will appear in _source. That is, the field cannot be used as a query condition, but can be queried.

Change dynamic to false and add a new field to verify that the new field appears in _source, but cannot be queried as a condition:

dynamic=strict

If a new field is not in the mapping definition, an error message is displayed:

Whether data types in mapping can be changed

In Elasticsearch, once a field is defined in the mapping, it cannot be modified, because once the field is modified, it cannot be indexed (except for the new field). If you need to modify the index, you will rebuild the index and migrate the data using reindex.

Close the dynamic mapping

You can disable dynamic Mapping using the following two configurations. The default values of the following two attributes are true. If you want to disable dynamic Mapping, you need to change the default values to False:

action.auto_create_index: true
index.mapper.dynamic: true
Copy the code

Explicit mapping

Explicit mapping is Explicit mapping. That is, we need to display the definition field type.

Elasticsearch supports a wide variety of fields. Here are some of the most common fields in Elasticsearch:

The text type

This is the most common type, storing strings for full-text indexing. When a field is defined as text, it cannot be used for aggregation, sorting, and so on by default:

As you can see, sort the report with a text field. If you want to allow these operations, you can set fieldData =true as follows

PUT my-index-011/_mapping
{
  "properties": {
    "my_field": { 
      "type":     "text"."fielddata": true}}}Copy the code

Field fields are stored in heap memory, and since the calculations involved are performance-intensive, it is generally not recommended to set FieldData =true. Instead, this is done by creating a keyword subfield (the default) :

PUT index_111
{
  "mappings": {
    "properties": {
      "my_field": { 
        "type": "text"."fields": {
          "keyword": { 
            "type": "keyword"
          }
        }
      }
    }
  }
}
Copy the code

This way we can define a field as both text and keyword, and use the field name if we want to do things like aggregate or sort. Keyword is used as the field name:

Keyword type

This type is also very common. The data stored in this field represents a whole and cannot be divided into words. Therefore, it is not used to define the full-text retrieval field of a large article, but to store some structured strings, such as: ID, mailbox, tag, etc.

The keyword type is typically used for aggregation, sorting, and so on. In addition, there are two derived types of this field: constant_keyword and wildcard.

  • constant_keyword: is used to define a constant type. For example, a field in an index that contains the same value can be defined as this type.
  • wildcard: is used to query fuzzy matching or regular matching.

Here is an example of a fuzzy matching query (which can be used with wildcards, similar to the like operation of a relational database) :

GET index_112/_search
{
  "query": {
    "wildcard": {
      "my_wildcard": {
        "value": "*quite*lengthy"}}}}Copy the code

The date type

Use format to specify a date type. When specifying a date type, use format to specify a date format:

PUT index_113
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date"."format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}}}}Copy the code

Numeric types

Elasticsearch provides a wide variety of formats for representing numeric types of different lengths:

Numeric types The length of the
long 64 – bit signed integer. Range: -2 to the 63rd to 2 to the 63rd to -1
integer 32 – bit signed integer. Range: -2 to the 31th to 2 to the 31th to -1
short 16 – bit signed integer. Range: -32768 to 32767
byte An 8-bit signed integer. Range: -128 to 127
double 64-bit double decimal
float 32-bit single precision decimal
half_float 16 digits single precision decimal
scaled_float Floating-point numbers with scaling factors are generally useful for storing data such as amounts. For example, 18.88 yuan with a scale factor of 100 will be indexed to 1888 (i.e. original * scale factor).
unsigned_long 64 – bit unsigned integer. Range: 0 to 2 to the power of 64 minus 1

The definition is as follows:

PUT index_002
{
  "mappings": {
    "properties": {
      "number_of_bytes": {
        "type": "integer"
      },
      "time_in_seconds": {
        "type": "float"
      },
      "price": {
        "type": "scaled_float"."scaling_factor": 100}}}}Copy the code

Boolean type

Boolean types are simple, with only true and false:

PUT index_001
{
  "mappings": {
    "properties": {
      "is_published": {
        "type": "boolean"}}}}Copy the code

Other types of

In addition to the more common data types described above, Elasticsearch also has several advanced data types: Nested, geographic data types, IP types, etc.

conclusion

Elasticsearch supports dynamic mapping and display mapping. You can insert a data item into a temporary index and modify the existing mapping after the mapping is automatically generated. In particular, the text and keyword types should be considered. If full-text search and segmentation search are needed, the text type should be used. Keyword fields can be used in operations such as fuzzy keyword search or aggregated sorting.