Mapping is the process of defining how documents and the fields they contain are stored and indexed. For example, use a map to define:
- Which string fields should be treated as full-text fields.
- Which fields contain numbers, dates, or geographic locations.
- The format of the date value.
- Custom rules to control mapping of dynamically added fields.
A mapping definition:
Meta-fields
: meta fields are used to customize how the metadata related to the document is handled. Examples of meta-fields include those of documents_index
._id
and_source
Field.Fields or properties
The: map contains a list of fields or attributes associated with the document.
Prior to 7.0.0, mapping definitions used to contain type names.
The data type of the field
Each field has a data type, which can be of the following types:
- Simple types such as
text
.keyword
.date
.long
.double
.boolean
.ip
. - A type that supports the hierarchical nature of JSON, for example
object
ornested
. - Or a particular type, for example
geo_point
.geo_shape
orcompletion
.
It is often useful to index the same field in different ways for different purposes. For example, a string field can be indexed as a text field for a full-text search, or as a keyword field for sorting or aggregation. Alternatively, you can use standard, English, and French parsers to index string fields.
This is the purpose of multi-field. Most data types support multiple fields with the fields parameter.
Settings that prevent mapping surges
A situation where too many fields are defined in an index can lead to a mapping explosion, which can lead to out-of-memory errors and difficult-to-recover conditions. The problem may be more widespread than expected. For example, consider a case where each new document inserted introduces a new field. This is common in dynamic mapping. Whenever a document contains new fields, those fields eventually appear in the index’s map. Don’t worry about the small amount of data, but it can be a problem as the mapping grows. The following Settings allow you to limit the number of field mappings that can be created manually or dynamically to prevent bad documents from causing mappings to explode:
index.mapping.total_fields.limit
The maximum number of fields in an index. Field and object mappings and field aliases fall under this restriction. The default value is 1000.
This restriction is in place to prevent mapping and searching from becoming too large. Higher values can lead to performance degradation and memory problems, especially in clusters with high loads or few resources. If you add this setting, we recommend that you also add the ‘index.query.bool. Max_clause_count’ setting, which limits the maximum number of Boolean clauses in a query.
index.mapping.depth.limit
The maximum depth of a field, measured by the number of internal objects. For example, if all fields are defined at the root object level, the depth is 1, if there is an object map, the depth is 2, and so on. The default value is 20.
index.mapping.nested_fields.limit
Maximum number of different nested mappings in an index. Default is 50.index.mapping.nested_objects.limit
The maximum number of nested JSON objects in all nested types in a single document. The default is 10000.index.mapping.field_name_length.limit
Sets the maximum length of the field name. The default value is long.max_value (unrestricted). This setting does not actually solve the mapping explosion problem, but it may still be useful if you want to limit the field length. This setting is not usually required. Unless the user starts adding a lot of fields with long names, the default is fine.
Dynamic Mapping
Field and mapping types do not need to be defined before they are used. With dynamic mapping, new field names are automatically added just by indexing documents. New fields can be added to either the top-level mapping type or to internal objects and nested fields.
Dynamic mapping rules can be configured to customize mappings for new fields.
Explicit mapping
You know more about the data than Elasticsearch can guess, so while dynamic mappings are useful for getting started, sometimes you’ll need to specify your own explicit mappings.
You can create field mappings when you create an index or add fields to an existing index
- Create a display map for the index
You can use the CREATE Index API to create new indexes with explicit mappings.
curl -X PUT "localhost:9200/my-index? pretty" -H 'Content-Type: application/json' -d' { "mappings": { "properties": { "age": { "type": "integer" }, "email": { "type": "keyword" }, "name": { "type": "text" } } } } '
Copy the code
- Add a field to an existing mapping
You can use the Put Mapping API to add one or more new fields to an existing index. The following example adds employee-id, which is a keyword field with an index mapping parameter value of false. This means that the employee-ID field values are stored, but cannot be indexed or used for searching.
curl -X PUT "localhost:9200/my-index/_mapping? pretty" -H 'Content-Type: application/json' -d' { "properties": { "employee-id": { "type": "keyword", "index": false } } } '
Copy the code
Update the field mapping
You cannot change the mapping or field type of an existing field, except for the supported mapping parameters. Changing an existing field may invalidate data that has already been indexed.
If you need to change the mapping of a field, create a new index with the correct mapping, and then reindex the data into that index.
Renaming a field invalidates data that has been indexed under the old field name. Instead, add an alias field to create an alternate field name.
View the index mapping
You can use the GET Mapping API to view mappings for existing indexes.
curl -X GET "localhost:9200/my-index/_mapping? pretty"
Copy the code
The API returns the following response:
{
"my-index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"employee-id" : {
"type" : "keyword"."index" : false
},
"name" : {
"type" : "text"
}
}
}
}
}
Copy the code
View mappings for specific fields
If you only want to view the mapping of one or more specific fields, you can use the GET field mapping API.
This is useful if you don’t need a full mapping of the index or if the index contains a large number of fields.
A subsequent request retrieves the mapping of the Employee-ID field.
curl -X GET "localhost:9200/my-index/_mapping/field/employee-id? pretty"
Copy the code
The API returns the following response:
{
"my-index" : {
"mappings" : {
"employee-id" : {
"full_name" : "employee-id"."mapping" : {
"employee-id" : {
"type" : "keyword"."index" : false
}
}
}
}
}
}
Copy the code
Deleting a Mapping Type
Indexes created in Elasticsearch 7.0.0 or later no longer accept _default_ mapping. Indexes created in 6.x will continue to operate as if they were in Elasticsearch 6.x. The use of types is not recommended in the 7.0 API, which includes significant changes to index Creation, Place Mapping, Get Mapping, Place Template, Get Template, and Get Field mapping apis.
What is the mapping type
Since the first version of Elasticsearch, each document is stored in a single index and assigned a single mapping type. A mapping type is used to indicate the type of document or entity to be indexed; for example, a Twitter index might have a user type and a Tweet type.
Each mapping type can have its own field. Therefore, the user type might have a full_NAME field, a user_NAME field, and an email field. The Tweet type can have a content field, a tweeted_at field, and a user_name field similar to the user type.
Each document has a _type meta-field that contains the type name, and you can limit the search to one or more types by specifying the type name in the URL:
GET twitter/user,tweet/_search
{
"query": {
"match": {
"user_name": "kimchy"}}}Copy the code
The _type field is combined with the _id of the document to produce the _uid field, so different types of documents with the same _ID can exist in a single index.
Mapping types are also used to establish parent-child relationships between documents, so a document of type Question can be the parent of a document of type Answer.
Why remove mapping Type?
Initially, we talked about “index” being similar to “database” in AN SQL database, and “type” being equivalent to “table.”
This is a bad analogy that leads to false assumptions. In an SQL database, tables are independent of each other. A column in one table is independent of a column with the same name in another table. This is not the case for mapping type fields.
Fields with the same name in different mapping types are internally supported by the same Lucene field in the Elasticsearch index. In other words, using the example above, the user_name field in the user type is stored in exactly the same field as the user_name field in the Tweet type, and the two User_name fields must have the same mapping (definition) in both types.
This can cause frustration, for example, when you want to remove a date field from the same type of index and a Boolean field from another type.
Most importantly, different entities with few or no identical fields stored in the same index can cause data sparsity and interfere with Lucene’s ability to effectively compress documents.
For these reasons, we decided to remove the concept of mapping types from Elasticsearch.
Alternative to Mapping Type
The first approach is to create an index for each document type. Rather than storing tweets and users in a single Twitter index. You can store tweets in the Tweets index and users in the User index. Indexes are completely independent of each other, so there are no conflicting field types between indexes.
This approach has two benefits:
- Data is more likely to be dense and therefore can benefit from the compression techniques used in Lucene.
- The terminology statistics used to score in a full-text search are more likely to be accurate because all documents in the same index represent one entity.
The size of each index can be adjusted appropriately based on the number of documents to be included: you can use a smaller number of master shards for users and a larger number for tweets.
Of course, there is a limit to how many major shards can exist in a cluster, so you might not want to waste an entire shard on just a few thousand documents. In this case, you can implement your own custom Type field, which works like the old _type.
Let’s take the user/tweet example above as an example. Initially, the workflow looks like this:
PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword"}}},"tweet": {
"properties": {
"content": { "type": "text" },
"user_name": { "type": "keyword" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/user/kimchy
{
"name": "Shay Banon"."user_name": "kimchy"."email": "[email protected]"
}
PUT twitter/tweet/1
{
"user_name": "kimchy"."tweeted_at": "2017-10-24T09:00:00Z"."content": "Types are going away"
}
GET twitter/tweet/_search
{
"query": {
"match": {
"user_name": "kimchy"}}}Copy the code
You can achieve the same goal by adding custom type fields, as follows:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/_doc/user-kimchy
{
"type": "user"."name": "Shay Banon"."user_name": "kimchy"."email": "[email protected]"
}
PUT twitter/_doc/tweet-1
{
"type": "tweet"."user_name": "kimchy"."tweeted_at": "2017-10-24T09:00:00Z"."content": "Types are going away"
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"user_name": "kimchy"}},"filter": {
"match": {
"type": "tweet"
}
}
}
}
}
Copy the code
Parent/child does not have mapping type
Previously, parent-child relationships were represented by having one mapping type as the parent and one or more other mapping types as children. Without a type, we can no longer use this syntax. The parent-child functionality will continue to work as before, except that the way you represent relationships between documents has been changed to use new connection fields.
Migrate a multi-type index to a single-type index
The Reindex API can be used to convert a multi-type index to a single-type index. The following example is available in Elasticsearch 5.6 or Elasticsearch 6.x. In 6.x, you do not need to specify index.mapping.single_type because this is the default.
The first example divides our Twitter index into tweets index and Users index:
PUT users
{
"settings": {
"index.mapping.single_type": true
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
},
"user_name": {
"type": "keyword"
},
"email": {
"type": "keyword"
}
}
}
}
}
PUT tweets
{
"settings": {
"index.mapping.single_type": true
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text"
},
"user_name": {
"type": "keyword"
},
"tweeted_at": {
"type": "date"
}
}
}
}
}
POST _reindex
{
"source": {
"index": "twitter"."type": "user"
},
"dest": {
"index": "users"."type": "_doc"
}
}
POST _reindex
{
"source": {
"index": "twitter"."type": "tweet"
},
"dest": {
"index": "tweets"."type": "_doc"}}Copy the code
The next example adds a custom Type field and sets it to the value of the original _type. If there are any different types of documents with conflicting ids, it also adds the type to the _ID:
PUT new_twitter
{
"mappings": {
"_doc": {
"properties": {
"type": {
"type": "keyword"
},
"name": {
"type": "text"
},
"user_name": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"content": {
"type": "text"
},
"tweeted_at": {
"type": "date"
}
}
}
}
}
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
},
"script": {
"source": """ ctx._source.type = ctx._type; ctx._id = ctx._type + '-' + ctx._id; ctx._type = '_doc'; """}}Copy the code
Typeless interfaces in 7.0
In Elasticsearch 7.0, each API will support untyped requests and specifying a type will generate deprecation warnings.
Index creation, index templates, and mapping apis support the new include_type_NAME URL parameter, which specifies whether the mapping definition in the request and response should contain a type name. In version 6.8, this parameter defaults to true to match the use of type names in mappings prior to 7.0. It defaults to false in version 7.0 and will be removed in version 8.0.
This should be explicitly set in 6.8 in preparation for the upgrade to 7.0. To avoid deprecation warnings in 6.8, you can set the parameter to true or false. In 7.0, setting include_type_name completely will result in deprecation warnings.
To set this option to false, see some examples of interacting with Elasticsearch:
curl -X PUT "localhost:9200/index? include_type_name=false&pretty" -H 'Content-Type: application/json' -d' { "mappings": { "properties": { "foo": { "type": "keyword" } } } } '
curl -X PUT "localhost:9200/index/_mappings? include_type_name=false&pretty" -H 'Content-Type: application/json' -d' { "properties": { "bar": { "type": "text" } } } '
Copy the code
curl -X GET "localhost:9200/index/_mappings? include_type_name=false&pretty"
Copy the code
{
"index": {
"mappings": {
"properties": {
"foo": {
"type": "keyword"
},
"bar": {
"type": "text"
}
}
}
}
}
Copy the code
In 7.0, the index API must be called using the {index}/_doc path to automatically generate _id and {index}/_doc/{ID} with explicit ID.
curl -X PUT "localhost:9200/index/_doc/1? pretty" -H 'Content-Type: application/json' -d' { "foo": "baz" } '
{
"_index": "index"."_id": "1"."_type": "_doc"."_version": 1,
"result": "created"."_shards": {
"total": 2."successful": 1,
"failed": 0}."_seq_no": 0."_primary_term": 1}Copy the code
Similarly, the GET and delete apis use paths {index}/_doc/{id} :
curl -X GET "localhost:9200/index/_doc/1? pretty"
Copy the code
For API paths that contain both a type and an endpoint name (such as _update), in 7.0 the endpoint will follow the index name:
curl -X POST "localhost:9200/index/_update/1? pretty" -H 'Content-Type: application/json' -d' { "doc" : { "foo" : "qux" } } '
curl -X GET "localhost:9200/index/_source/1? pretty"
Copy the code
The type should also no longer appear in the request body. The following example of bulk indexing omits this type both in the URL and in a single bulk command:
curl -X POST "localhost:9200/_bulk? pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "index", "_id" : "3" } }
{ "foo" : "baz" }
{ "index" : { "_index" : "index", "_id" : "4" } }
{ "foo" : "qux" }
'
Copy the code
When calling search apis such as _search, _msearch, or _explain, the URL should not contain the type. In addition, the _type field should not be used in queries, aggregations, or scripts.
The document and search API will continue to return the _type key in the response to avoid breaking response parsing. However, the key is considered deprecated and should no longer be referenced. In version 8.0, types are completely removed from responses.
Note that when using typed apis that are not recommended, the mapping type of the index is returned as usual, but the typeless API returns the virtual type _doc in the response. For example, even if the map has a custom type name like my_type, the following untyped GET calls will always return _doc as type:
curl -X PUT "localhost:9200/index/my_type/1? pretty" -H 'Content-Type: application/json' -d' { "foo": "baz" } '
curl -X GET "localhost:9200/index/_doc/1? pretty"
{
"_index" : "index"."_type" : "_doc"."_id" : "1"."_version" : 1,
"_seq_no": 0."_primary_term" : 1,
"found": true."_source" : {
"foo" : "baz"}}Copy the code
You are advised to add an index template again by setting include_type_name to false to make the index template typeless. Behind the scenes, untyped templates will use the pseudo-type _doc when creating indexes.
If an untyped template is used with a typed index creation call, or if a typed template is used with an untyped index creation call, the template is still applied, but the index creation call determines whether there should be a type. For example, in the following example, even though index-1-01 matches a template without type, index-1-01 will have type; Although index-2-01 matches the template defining the type, index-2-01 will be typeless. Both index-1-01 and index-2-01 will inherit the Foo field from the template they match.
curl -X PUT "localhost:9200/_template/template1? pretty" -H 'Content-Type: application/json' -d' { "index_patterns":[ "index-1-*" ], "mappings": { "properties": { "foo": { "type": "keyword" } } } } '
curl -X PUT "localhost:9200/_template/template2? include_type_name=true&pretty" -H 'Content-Type: application/json' -d' { "index_patterns":[ "index-2-*" ], "mappings": { "type": { "properties": { "foo": { "type": "keyword" } } } } } '
curl -X PUT "localhost:9200/index-1-01? include_type_name=true&pretty" -H 'Content-Type: application/json' -d' { "mappings": { "type": { "properties": { "bar": { "type": "long" } } } } } '
curl -X PUT "localhost:9200/index-2-01? pretty" -H 'Content-Type: application/json' -d' { "mappings": { "properties": { "bar": { "type": "long" } } } } '
Copy the code
In the case of creating implicit indexes, templates are always preferred because the document is indexed in an index that does not yet exist. Since untyped index calls work on typed indexes, this is usually not a problem.
Mixed version of the cluster
In clusters consisting of 6.8 and 7.0 nodes, the include_type_name parameter should be specified in an index API such as index creation. This is because the parameters have different default values between 6.8 and 7.0, so the same mapping definition is not valid for both node versions.
Untyped document apis (such as Bulk and Update) are only available starting with 7.0 and not for 6.8 nodes. The same is true for untyped queries that perform document lookups, such as terms.
The field type
Elasticsearch supports a number of different data types for fields in documents:
- Core data types:
- Strings: text and keyword
- The numerical model:
long, integer, short, byte, double, float, half_float, scaled_float
- Date:
date
- Date nanosecond:
date_nanos
- Boolean type:
boolean
- Binary:
binary
- Scope:
integer_range, float_range, long_range, double_range, date_range
- Complex types:
- Object: a single JSON object
- Nested: An array of JSON objects
- Geographical type:
- Geo-point: latitude/longitude
- Geo-shape: Complex shapes such as polygons
- Special type:
- IP:
- Completion: To provide suggestions for auto-completion
- Token count: Token_count is used to count the number of tokens in the string
- Mapper-murmur3: Evaluates the hash of a value at index time and stores it in the index
- Mapper-annotated -text: Text containing a special tag in the index (usually used to identify named entities)
- Percolator: accepts queries from query-DSL
- Join: Defines parent/child relationships for documents within the same index
- Rank feature: A feature that records numbers to improve click-through rates when querying.
- Rank Features: Record numbers to improve click-through rates for queries.
- Dense Vector: Dense vector that records floating point values.
- Sparse Vector: A Sparse vector that records floating-point values.
- Search-as-you-type: text field optimized for queries to enable on-demand typing completion
- Alias: Defines an Alias for an existing field.
- Flattened: Allows an entire JSON object to be indexed into a single field.
- Shape: The Shape of any Cartesian geometry
- Histogram: A Histogram of the pre-aggregation values of percentile aggregations.
- Array types: In Elasticsearch, arrays do not require special field data types. By default, any field can contain zero or more values, but all values in the array must have the same data type.
- Multi-field types: It is often useful to index the same field in different ways for different purposes. For example, string fields can be mapped to text fields for full-text search and to keyword fields for sorting or aggregation. In addition, you can index text fields using standard, English, and French analyzers. This is the purpose of multi-field. Most data types support multiple fields with the fields parameter.
The following details each field type:
Alias data type
An alias map defines another name for a field in an index. Aliases can be used to replace target fields in search requests and to select other apis, such as field capabilities.
PUT trips
{
"mappings": {
"properties": {
"distance": {
"type": "long"
},
"route_length_miles": {
"type": "alias"."path": "distance"
},
"transit_mode": {
"type": "keyword"
}
}
}
}
GET _search
{
"query": {
"range" : {
"route_length_miles" : {
"gte": 39}}}} {"took": 37."timed_out" : false."_shards" : {
"total": 15."successful": 15."skipped": 0."failed": 0}."hits" : {
"total" : {
"value": 0."relation" : "eq"
},
"max_score" : null,
"hits": []}}Copy the code
Almost all components of the search request accept field aliases. In particular, aliases can be used to query, aggregate, and sort fields, as well as when requesting docvalue_fields, stored_fields, suggestions, and highlights. The script also supports aliases when accessing field values.
Field wildcard patterns can be provided when searching parts of a request and request field capabilities. In these cases, the wildcard pattern will match the field alias in addition to the specific field:
curl -X GET "localhost:9200/trips/_field_caps? fields=route_*,transit_mode&pretty"
Copy the code
There are some restrictions on the target of aliases:
- The target must be a specific field, not an object or other field alias.
- The target field must exist when the alias is created.
- If nested objects are defined, the field alias must have the same nested scope as its target.
In addition, field aliases can have only one target. This means that it is not possible to query multiple target fields in a single clause using field aliases. You can change the alias to reference the new target through a mapping update. One known limitation is that if any stored leachate queries contain field aliases, they will still refer to their original target.
Unsupported APIs: Writing field aliases is not supported: Attempts to use alias names in index or update requests fail. Also, aliases cannot be used as copy_to targets or in multiple fields.
Because there are no aliases in the document source, aliases cannot be used when source filtering is performed. For example, the following request will return an empty result for _source:
curl -X GET "localhost:9200/_search? pretty" -H 'Content-Type: application/json' -d' { "query" : { "match_all": {} }, "_source": "route_length_miles" } '
Copy the code
Currently, only the search and field functionality apis accept and resolve field aliases. Other apis that accept field names, such as term vector, cannot be used with field aliases. Finally, some queries (such as terms, geoShape, and more_like_this) allow you to retrieve query information from index documents. Because field aliases are not supported when retrieving documents, the part of the query that specifies the lookup path cannot reference a field through its alias.
An array of
In Elasticsearch, there is no dedicated array data type. By default, any field can contain zero or more values, but all values in the array must have the same data type. Such as:
- Array of strings: [“one”, “two”]
- Integer array: [1,2]
- Array: [1, [2,3]] equivalent to [1, 2,3]
- Object array: [{” name “:” Mary “, “age” : 12}, {” name “:” John “, “age” : 10}]
Object arrays do not work as expected: you cannot query each object independently of the other objects in the array. If you need to do this, you should use nested data types instead of object data types.
When a field is added dynamically, the first value in the array determines the field type, and all subsequent values must have the same data type, or at least must be able to cast subsequent values to the same data type.
Not supporting mixed arrays of data types: [10, “some string”]
Arrays can contain null values that are either replaced by the configured NULl_value or skipped entirely. An empty array [] is treated as missing fields – fields with no value.
Arrays can be used in documents without prior configuration, supporting them instantly:
curl -X PUT "localhost:9200/my_index/_doc/1? pretty" -H 'Content-Type: application/json' -d' { "message": "some arrays in this document..." , "tags": [ "elasticsearch", "wow" ], "lists": [ { "name": "prog_list", "description": "programming list" }, { "name": "cool_list", "description": "cool stuff list" } ] } '
curl -X PUT "localhost:9200/my_index/_doc/2? pretty" -H 'Content-Type: application/json' -d' { "message": "no arrays in this document..." , "tags": "elasticsearch", "lists": { "name": "prog_list", "description": "programming list" } } '
curl -X GET "localhost:9200/my_index/_search? pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "tags": "elasticsearch" } } } '
Copy the code