[This is my 16th day of The November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021]


The index module

An index module is a module created for each index that defines all the configuration associated with the index. We can configure two different index levels for each index setting:

  • Static index: can only be inWhen creatingOr in theBe shut downSet on the index of
  • Dynamic index: the existing index can be set arbitrarily

The index module is classified as follows:

Static index setting

Shard the master shard

Number_of_routing_shards Is used to set routing parameters for document fragmentation

  • Elasticsearch uses this value when splitting indexes. The parameter isindex.number_of_shardsInteger multiple of;
  • The default value of this setting depends on the number of master shards in the index, and the default setting allows shards to be pressedFactor of 2Maximum split1024A shard.

Index. number_of_shards Sets the number of primary shards for the index

  • The default value is 1.
  • This setting can only be set at index creation time and cannot be changed on closed indexes;
  • By default, the maximum number of fragments is 1024 to prevent excessive fragments from damaging cluster stabilityexport ES_JAVA_OPTS="-Des.index.max_number_of_shards=128"Modify the restriction.

Routing formula:

routing_factor = num_routing_shards / num_primary_shards  3=30/3
shard_num = (hash(_routing) % num_routing_shards) / routing_factor  (0~9)=(0~29)/3
Copy the code

Formula value:

  • num_primary_shardsThe value isindex.number_of_shardsThe value of the
  • num_routing_shardsThe value isnumber_of_routing_shardsThe value of the
  • The default_routingThe value is document_id, can also be set for each documentroutingSpecify custom values to implement

With this routing formula, you can extend the master shard. The illustration below

Custom Route

Adding a custom route:

PUT my-index-000001/_doc/1? routing=user1&refresh=true { "title": "This is a document" } GET my-index-000001/_doc/1? routing=user1Copy the code

Routing result:

The _id does not guarantee the uniqueness of all fragments in an index. In fact, identical documents may end up in different shards if the _id is indexed with different _routing values; The user needs to set appropriate parameters to ensure the uniqueness of the ID.

Set route parameters as required:

POST my-index-000002 { "mappings": { "_routing": { "required": true } } } PUT my-index-000002/_doc/1? routing=user1&refresh=true { "title": "This is a document2" } GET my-index-000002/_doc/1? routing=user1Copy the code

Set the _routing attribute of the index to true. If routing is not used, routing_missing_exception is raised.

Index. routing_partition_size Number of fragments that a user-defined route can reach. The default value is 1. This parameter can only be set during index creation.

When this parameter is set, the calculation formula changes:

routing_value = hash(_routing) + hash(_id) % routing_partition_size
shard_num = (routing_value % num_routing_shards) / routing_factor
Copy the code

In Elasticsearch 7.0.0 and later, this setting affects how documents are distributed between shards. When reindexing old indexes using custom routes, you must explicitly set index. number_of_ROUTing_shards to maintain the same document distribution

other

  1. The compression

The index.codec is used to set the data compression type. The default value is to use LZ4 compression to compress stored data. You can set Deflate to get a higher compression rate but degrade the performance of stored fields. If the compression type is being updated, the new compression type is applied after the merged segment. You can use Force Merge to enforce merge segments.

On the computing side, Deflate is a lossless data file compression format that combines LZSS and Hoffman encoding. It was designed by Phil Katz for version 2 of his PKZIP compression tool.

  1. Soft delete

Index. soft_delt.enabled is deprecated in 7.6.0. Creating an index that disables soft deletion is deprecated and will be removed in a future version of Elasticsearch. Soft delete can only be configured during index creation.

Soft_delt.retention_lease. period Sets the historical record retention time for sharding to ensure that soft deletes are retained during Lucene index merging. The default is 12h.

  1. other

Index.load_fixed_bitset_filters_gbit/s defines whether caching filters are preloaded for nested queries. Possible values are true (the default) and false.

Elasticsearch automatically checks the integrity of shard content at various points in the shard life cycle. The index.shard. Check_on_startup setting determines whether Elasticsearch performs additional integrity checks when opening sharding. If these checks detect corruption, they prevent the shard from opening. The related values are as follows:

The values
false Do not perform additional damage checking when opening sharding, which is the default and recommended behavior
checksum To validate each file in the shardThe checksumWhether it matches its content
true Check for physical and logical damage. This is an expensive operation, consuming CPU and memory
fix This option has been deprecated and permanently removed after 7.0

Dynamic index setting

Dynamic index Settings can be changed on live indexes using the update-index-settings API.

The title
index.number_of_replicas Number of copies per master shard. Default is 1
index.auto_expand_replicas Automatically expand the number of replicas based on the number of data nodes in the cluster. Default is false
index.search.idle.after Shards cannot be received until they are considered search freesearchorTo obtainRequest time. (Default: 30s)
index.refresh_interval How often the refresh operation is performed; The default is1s. Can be set- 1To disable refresh
index.max_result_window from + sizeSearch for the maximum value of this index, where from indicates the sequence number of the starting data and size indicates the number of data. The default is10000, too small may lead toResult window is too large
index.max_inner_result_window Limits the result set in the returned result, which defaults to 100
index.max_rescore_window docvalue_fieldsThe maximum number allowed in the query
index.max_docvalue_fields_search script_fieldsMaximum number allowed in a query. The default value is32
index.max_script_fields script_fieldsMaximum number allowed in a query. The default value is32
index.max_ngram_diff The difference between max_gram and min_gram in the NGram token generator must be less than or equal tomax_script_fields
index.max_shingle_diff The maximum allowable difference between max_SHingLE_SIZE and min_shingLE_size for the Shingle token filter is 3 by default
index.max_refresh_listeners The maximum number of refresh listeners available on each shard of the index
index.analyze.max_token_count You can use_analyze APIThe maximum number of tokens to generate. Default is10000
index.highlight.max_analyzed_offset The maximum number of characters parsed for the highlight request
index.max_terms_count Terms Specifies the maximum number of Terms that can be used in the query. The default value is65536
index.max_regex_length The maximum length of a regex that can be used in a Regexp Query. The default is1000
index.query.default_field A wildcard pattern that matches one or more fields
index.routing.allocation.enable Control the Sharding allocation of this index: ALL (All Sharding), Primaries (Primaries), New_Primaries (Newly created Primary Sharding), None (Not allowed)
index.routing.rebalance.enable Enable sharding rebalancing for this index: all,primaries,replicas(Sub-fragment),none
index.gc_deletes Retention time of a deleted document. The default retention time is 60 seconds
index.default_pipeline The default intake node pipe for the index
index.final_pipeline In the end pipe
index.mapping.dimension_fields.limit Maximum number of time series dimensions for an index (for internal use of Elastic only)
index.hidden Indicates whether the index should be hidden by default, which it is not

Other index Settings

Other index Settings available in the index module:

  • Analysis of the
    • Used to define Settings for analyzers, markers, tag filters, and character filters.
  • Index sharding allocation
    • Controls where, when, and how shards are allocated to nodes.
  • mapping
    • Enable or disable dynamic index mapping.
  • merge
    • Controls how the background merge process merges shards.
  • The similarity
    • Configuring user-defined similarity You can customize the scoring method of search results.
  • Slow log
    • Controls the speed at which records are queried and requests are obtained.
  • storage
    • The type of the file system used to access fragmented data is specified.
  • Across the log
    • Controls transaction logging and background refresh operations.
  • History to retain
    • Controls the retention of operation history in the index.
  • Load pressure
    • Configure load limits.

The resources

index-modules