In Elasticsearch, an index can be thought of as an optimized collection of documents, and each document is a collection of fields, which are key-value pairs containing your data.

That is: index → document → field → data.

An Elasticsearch index is just a logical group of one or more physical shards, each of which is actually a separate index. By distributing documents in the index over multiple shards and those shards over multiple nodes.

That is: node → sharding → document (index).

Shards are divided into master shards and copies, with each document belonging to a master shard, and then a deputy shard is a copy of the master shard.

Core concept analogy:

concept analogy
The index (indices) Database
Type (type) Table (Table)
The document (the document) Row (Row)
Field (field) Column (Columns)

Starting from 6.0.0, a single index can have only one type. Starting from 7.0.0, it is not recommended. Not supported at all after 8.0.0.

The index set

Index Settings are divided into:

  • Static index Settings: set when the index is created, or can be modified after the index is closed.
  • Dynamic index Settings: This can be changed directly using the update index API.

Related parameters are shown in the following table:

classification parameter instructions
Static Index Settings
number_of_shards The number of primary shards for the index. Default is1, the upper limit for1024. Only the indexCreate time Settings.
number_of_routing_shards Used to split (split) Indicates the number of index route fragments
shard.check_on_startup Fragments are detected before opening. If damaged, opening is interrupted. The default value isfalse, the values are:true,checksum,false
codec LZ4 is used to compress and store data by defaultbest_compressionHigher compression ratio, but lower storage field performance.
routing_partition_size User-defined number of route fragments that can be converted. The default value1Must be less thannumber_of_shards(unless it is also 1), index onlyCreate time Settings.
soft_deletes.enabled Soft index deletion, defaulttrue. Supported versions: [6.5.0.7.6.0), the version earlier than the set does not have this configuration. The version greater than the set is not recommended
soft_deletes.retention_lease.period Retention period of fragment history before expiration. Default12h
load_fixed_bitset_filters_eagerly Whether to preload cache filters for nested queries, defaulttrue, the values are:true,false
Dynamic Index Setting
hidden Whether the index is hidden by default. If the index is hidden, the parameter must be added to the requestexpand_wildcardsCan be queried. The default value isfalse, the values are:true,false
number_of_replicas The number of copies (backups) for each master shard. The default value is1
auto_expand_replicas Automatically scale the number of backups based on the number of data nodes in the cluster, set to a connected-separated lower and upper limit (e.g.,0 to 5) or useallAs an upper limit (e.g.,0-all). The default isfalse(that is, disabled).
search.idle.after Wait time before searching idle, default value30s
refresh_interval Refresh the operation cycle to make the index’s most recent changes searchable. The default value1s, can be set to- 1Disable refresh
max_result_window The maximum value of data searched from the index (i.efrom + size), the default value10000
max_inner_result_window Index internal hit and maximum hit aggregation (i.efrom + size), the default value100
max_rescore_window Search index re-scoring (rescore) when requestedwindow_sizeMaximum value (default vsmax_result_windowAs the default10000)
max_docvalue_fields_search Allow the querydocvalue_fieldsMaximum value, default value100
max_script_fields Allow the queryscript_fieldsMaximum value, default value32
max_ngram_diff Used forNGramTokenizerandNGramTokenFilterThe,min_gramandmax_gramMaximum difference between, default value1
max_shingle_diff Used forshingle token filterThe,max_shingle_sizeandmin_shingle_sizeMaximum difference between, default value3
max_refresh_listeners Maximum number of refresh listeners per index shard.
analyze.max_token_count The maximum tag value for the _analyze API, default10000
highlight.max_analyzed_offset The maximum number of parsed characters used to highlight requests, only for noneoffsetsandterm vectorsThe text highlighting request is valid. The default value1000000
max_terms_count Maximum number of words used for word query. Default value65536
max_regex_length The maximum length of a re used for a re query, default1000
routing.allocation.enable Controls the sharding allocation of indexes. The default valueall, the values are:all,primaries,new_primaries,none
routing.rebalance.enable Allows index fragment rebalancing, defaultall, the values are:all,primaries,replicas,none
gc_deletes The number of versions of deleted documents that can be used for further versioning, the default60s
default_pipeline The default ingest node pipeline for an index. Usable parameterspipelineOverloading. Specific parameter_noneIndicates that the intake node pipe will not run.
final_pipeline An ingest node pipeline for an index. Specific parameter_noneIndicates that the intake node pipe will not run.

With so many Settings, it is not normally configured. Number_of_shards (the number of primary shards) and number_of_replicas (the number of replicas of the primary shards) are commonly used.

Create a sample

1. Create an index and configure number_of_SHards and number_of_replicas. The master shard is set to 3, and the master shard’s copy is set to 2. The index name is displayed in the path and the following JSON is the request body content. The index name must comply with the following rules:

The index name must comply with the following conventions:

  • The value can only be lowercase characters

  • Cannot contain the characters \, /, *,? , “, <, >, |, (space), and #

  • Indexes can contain colons (:) prior to 7.0, but not recommended after.

  • Cannot start with -, _, or +

  • Cannot be. Or..

  • The length cannot exceed 255 bytes (note bytes, so multi-byte characters will reach the 255 limit faster)

  • Names starting with. Are not recommended, except for hidden and internal indexes managed by plug-ins

PUT /my-index-000001

{

 "settings": {

  "index": {

   "number_of_shards": 3."number_of_replicas": 2}}}Copy the code

Further, instead of displaying the specified index section:

PUT /my-index-000001
{
  "settings": {
    "number_of_shards": 3."number_of_replicas": 2}}Copy the code
  1. Plus the mapping configuration
PUT /test
{
  "settings": {
    "number_of_shards": 1}."mappings": {
    "properties": {
      "field1": { "type": "text"}}}}Copy the code
  1. Configuring index Aliases
{
  "aliases": {
    "alias_1": {},
    "alias_2": {
      "filter": {
        "term": { "user.id": "kimchy"}},"routing": "shard-1"}}}Copy the code

Sample response:

{
  "acknowledged": true."shards_acknowledged": true."index": "test"
}
Copy the code

Trees indicates whether the index was successfully created in the cluster, and shards_trees indicates whether the required number of shard copies was started for each shard before timeout.

It is important to note that both values can be false, but the index can also be created successfully, indicating only the state of the operation before the timeout.

Index Management API

Further, the API related to index management is as follows:

The API documentation Http method URL The path parameter Query parameters Request body
Create indexes PUT /{index} {index}The index name include_type_name

wait_for_active_shards

master_timeout

timeout
aliases

mappings
Remove the index DELETE /{index} {index}The index name allow_no_indices

expand_wildcards

ignore_unavailable

master_timeout

timeout
There is no
To obtain the index GET /{index} {index}The index name allow_no_indices

expand_wildcards

flat_settings

include_defaults

include_type_name

ignore_unavailable

local

master_timeout

There is no
The index is HEAD /{index} {index}Index names, which can be multiple names separated by commas allow_no_indices

expand_wildcards

flat_settings

include_defaults

include_type_name

ignore_unavailable

local

There is no
Close the index POST /{index}/_close {index}Index names, which can be multiple names separated by commas allow_no_indices

expand_wildcards

ignore_unavailable

wait_for_active_shards

master_timeout

timeout
There is no
Open the index POST /{index}/_open {index}Index names, which can be multiple names separated by commas. Also can use_all*On behalf of all allow_no_indices

expand_wildcards

ignore_unavailable

wait_for_active_shards

master_timeout

timeout
There is no
Shrinkage index POSTPUT /{index}/_shrink/{target_index} {index}Source index name

{target_index}Target index name
wait_for_active_shards

master_timeout

timeout
aliases

settings

max_primary_shard_size

Break up the index POSTPUT /{index}/_split/{target-index} {index}Source index name

{target_index}Target index name
wait_for_active_shards

master_timeout

timeout
aliases

settings
Copy the index POSTPUT /{index}/_clone/{target-index} {index}Source index name

{target_index}Target index name
wait_for_active_shards

master_timeout

timeout
aliases

settings
Flip the index POST / {rollover – target} / _rollover or / {rollover – target} / _rollover / {target – index} {rollover-target}The name of the index to be flipped.{target-index}Optional target index name dry_run

include_type_name

wait_for_active_shards

master_timeout

timeout
aliases

conditions

mappings

settings
Freezing index POST /{index}/_freeze {index}The index name There is no There is no
Thawing index POST /{index}/_unfreeze {index}The index name There is no There is no
Analytical index GET /_resolve/index/{name} {name}Index names, which can be multiple names separated by commas expand_wildcards

There is no

Based on the data in the above table, we can further summarize as follows.

  1. The index ofCRUDOperating as follows:
  • Add: Creates an indexPUT /{index}
  • Delete: Deletes an indexDELETE /{index}
  • Search: get the indexGET /{index}
  • Modified: there is no API to modify index directly, there is API to modify index attributes, and need to see whether the attributes are static or dynamic attributes, etc.
  1. Index existence operation:
  • The index isHEAD /{index}

Elasticsearch responds 200 if it has the index, 404 if it doesn’t.

  1. Index on and off operations:
  • Shut downPOST /{index}/_close
  • openPOST /{index}/_open

After an index is closed, it cannot be read or written. Switch status can be found at GET /_cat/indices/ Customer? V, the value is open or close

  1. Index freezing and unfreezing operations:
  • freezePOST /{index}/_freeze
  • thawPOST /{index}/_unfreeze

After the index is frozen, it cannot write. The frozen state can be viewed in GET /{index}, in {index}-> Settings ->index->frozen, with a value of false or true.

  1. Index shrink and expand operations:
  • shrinkagePOST /{index}/_shrink/{target_index}
  • Break upPOST /{index}/_split/{target_index}

Generate a new index and reduce or increase the number of shards for the index.

  1. Other operations:
  • copyPOST /{index}/_clone/{target_index}

Copy to generate a new index.

  • flipPOST / {rollover - target} / _rollover or / {rollover - target} / _rollover / {target - index}

When the specified condition is met, the write data of the index alias points to the new index to achieve the effect similar to data archiving.

  • parsingGET /_resolve/index/{name}

Gets the name, alias, and data flow of the related index with the specified name.

From the above summary, we can see that the API for index management can only operate directly on the index itself, and does not involve much more detailed things. Once again, Elasticsearch’s action keyword starts with an underscore _, such as _clone.

In the next article we will learn about index mapping management.

Welcome to my blog: The Border city of A-woo

Welcome to pay attention to my public number: A woo programming