[This is my 16th day of The November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021]

The index module

An index module is a module created for each index that defines all the configuration associated with the index. We can configure two different index levels for each index setting:

Static index: can only be inWhen creatingOr in theBe shut downSet on the index of
Dynamic index: the existing index can be set arbitrarily

The index module is classified as follows:

Static index setting

Shard the master shard

Number_of_routing_shards Is used to set routing parameters for document fragmentation

Elasticsearch uses this value when splitting indexes. The parameter isindex.number_of_shardsInteger multiple of;
The default value of this setting depends on the number of master shards in the index, and the default setting allows shards to be pressedFactor of 2Maximum split1024A shard.

Index. number_of_shards Sets the number of primary shards for the index

The default value is 1.
This setting can only be set at index creation time and cannot be changed on closed indexes;
By default, the maximum number of fragments is 1024 to prevent excessive fragments from damaging cluster stabilityexport ES_JAVA_OPTS="-Des.index.max_number_of_shards=128"Modify the restriction.

Routing formula:

routing_factor = num_routing_shards / num_primary_shards  3=30/3
shard_num = (hash(_routing) % num_routing_shards) / routing_factor  (0~9)=(0~29)/3
Copy the code

Formula value:

num_primary_shardsThe value isindex.number_of_shardsThe value of the
num_routing_shardsThe value isnumber_of_routing_shardsThe value of the
The default_routingThe value is document_id, can also be set for each documentroutingSpecify custom values to implement

With this routing formula, you can extend the master shard. The illustration below

Custom Route

Adding a custom route:

PUT my-index-000001/_doc/1? routing=user1&refresh=true { "title": "This is a document" } GET my-index-000001/_doc/1? routing=user1Copy the code

Routing result:

The _id does not guarantee the uniqueness of all fragments in an index. In fact, identical documents may end up in different shards if the _id is indexed with different _routing values; The user needs to set appropriate parameters to ensure the uniqueness of the ID.

Set route parameters as required:

POST my-index-000002 { "mappings": { "_routing": { "required": true } } } PUT my-index-000002/_doc/1? routing=user1&refresh=true { "title": "This is a document2" } GET my-index-000002/_doc/1? routing=user1Copy the code

Set the _routing attribute of the index to true. If routing is not used, routing_missing_exception is raised.

Index. routing_partition_size Number of fragments that a user-defined route can reach. The default value is 1. This parameter can only be set during index creation.

When this parameter is set, the calculation formula changes:

routing_value = hash(_routing) + hash(_id) % routing_partition_size
shard_num = (routing_value % num_routing_shards) / routing_factor
Copy the code

In Elasticsearch 7.0.0 and later, this setting affects how documents are distributed between shards. When reindexing old indexes using custom routes, you must explicitly set index. number_of_ROUTing_shards to maintain the same document distribution

other

The compression

The index.codec is used to set the data compression type. The default value is to use LZ4 compression to compress stored data. You can set Deflate to get a higher compression rate but degrade the performance of stored fields. If the compression type is being updated, the new compression type is applied after the merged segment. You can use Force Merge to enforce merge segments.

On the computing side, Deflate is a lossless data file compression format that combines LZSS and Hoffman encoding. It was designed by Phil Katz for version 2 of his PKZIP compression tool.

Soft delete

Index. soft_delt.enabled is deprecated in 7.6.0. Creating an index that disables soft deletion is deprecated and will be removed in a future version of Elasticsearch. Soft delete can only be configured during index creation.

Soft_delt.retention_lease. period Sets the historical record retention time for sharding to ensure that soft deletes are retained during Lucene index merging. The default is 12h.

other

Index.load_fixed_bitset_filters_gbit/s defines whether caching filters are preloaded for nested queries. Possible values are true (the default) and false.

Elasticsearch automatically checks the integrity of shard content at various points in the shard life cycle. The index.shard. Check_on_startup setting determines whether Elasticsearch performs additional integrity checks when opening sharding. If these checks detect corruption, they prevent the shard from opening. The related values are as follows:

The values
false	Do not perform additional damage checking when opening sharding, which is the default and recommended behavior
checksum	To validate each file in the shard`The checksum`Whether it matches its content
true	Check for physical and logical damage. This is an expensive operation, consuming CPU and memory
~~fix~~	This option has been deprecated and permanently removed after 7.0

Dynamic index setting

Dynamic index Settings can be changed on live indexes using the update-index-settings API.

The title
`index.number_of_replicas`	Number of copies per master shard. Default is 1
index.auto_expand_replicas	Automatically expand the number of replicas based on the number of data nodes in the cluster. Default is false
index.search.idle.after	Shards cannot be received until they are considered search free`search`or`To obtain`Request time. (Default: 30s)
index.refresh_interval	How often the refresh operation is performed; The default is`1s`. Can be set`- 1`To disable refresh
index.max_result_window	`from + size`Search for the maximum value of this index, where from indicates the sequence number of the starting data and size indicates the number of data. The default is`10000`, too small may lead to`Result window is too large`
index.max_inner_result_window	Limits the result set in the returned result, which defaults to 100
index.max_rescore_window	`docvalue_fields`The maximum number allowed in the query
index.max_docvalue_fields_search	`script_fields`Maximum number allowed in a query. The default value is`32`
index.max_script_fields	`script_fields`Maximum number allowed in a query. The default value is`32`
index.max_ngram_diff	The difference between max_gram and min_gram in the NGram token generator must be less than or equal to`max_script_fields`
index.max_shingle_diff	The maximum allowable difference between max_SHingLE_SIZE and min_shingLE_size for the Shingle token filter is 3 by default
index.max_refresh_listeners	The maximum number of refresh listeners available on each shard of the index
index.analyze.max_token_count	You can use`_analyze API`The maximum number of tokens to generate. Default is`10000`
index.highlight.max_analyzed_offset	The maximum number of characters parsed for the highlight request
index.max_terms_count	Terms Specifies the maximum number of Terms that can be used in the query. The default value is`65536`
index.max_regex_length	The maximum length of a regex that can be used in a Regexp Query. The default is`1000`
index.query.default_field	A wildcard pattern that matches one or more fields
index.routing.allocation.enable	Control the Sharding allocation of this index: ALL (All Sharding), Primaries (Primaries), New_Primaries (Newly created Primary Sharding), None (Not allowed)
index.routing.rebalance.enable	Enable sharding rebalancing for this index: all,`primaries`,`replicas`(Sub-fragment),`none`
index.gc_deletes	Retention time of a deleted document. The default retention time is 60 seconds
index.default_pipeline	The default intake node pipe for the index
index.final_pipeline	In the end pipe
index.mapping.dimension_fields.limit	Maximum number of time series dimensions for an index (for internal use of Elastic only)
index.hidden	Indicates whether the index should be hidden by default, which it is not

Other index Settings

Other index Settings available in the index module:

Analysis of the
- Used to define Settings for analyzers, markers, tag filters, and character filters.
Index sharding allocation
- Controls where, when, and how shards are allocated to nodes.
mapping
- Enable or disable dynamic index mapping.
merge
- Controls how the background merge process merges shards.
The similarity
- Configuring user-defined similarity You can customize the scoring method of search results.
Slow log
- Controls the speed at which records are queried and requests are obtained.
storage
- The type of the file system used to access fragmented data is specified.
Across the log
- Controls transaction logging and background refresh operations.
History to retain
- Controls the retention of operation history in the index.
Load pressure
- Configure load limits.

The resources

index-modules

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elasticsearch7 – Index module

The index module

Static index setting

Shard the master shard

Custom Route

other

Dynamic index setting

Other index Settings

The resources

Elasticsearch7 – Index module

The index module

Static index setting

Shard the master shard

Custom Route

other

Dynamic index setting

Other index Settings

The resources

Related Posts

Build and develop scaffolding from scratch to implement online WebLog and dynamically change log levels

Build a distributed key-value Store using Rust

Analyze and rectify system faults in Linux