In Elasticsearch, an index can be thought of as an optimized collection of documents, and each document is a collection of fields, which are key-value pairs containing your data.
That is: index → document → field → data.
An Elasticsearch index is just a logical group of one or more physical shards, each of which is actually a separate index. By distributing documents in the index over multiple shards and those shards over multiple nodes.
That is: node → sharding → document (index).
Shards are divided into master shards and copies, with each document belonging to a master shard, and then a deputy shard is a copy of the master shard.
Core concept analogy:
concept | analogy |
---|---|
The index (indices) | Database |
Type (type) | Table (Table) |
The document (the document) | Row (Row) |
Field (field) | Column (Columns) |
Starting from 6.0.0, a single index can have only one type. Starting from 7.0.0, it is not recommended. Not supported at all after 8.0.0.
The index set
Index Settings are divided into:
- Static index Settings: set when the index is created, or can be modified after the index is closed.
- Dynamic index Settings: This can be changed directly using the update index API.
Related parameters are shown in the following table:
classification | parameter | instructions |
---|---|---|
Static Index Settings | ||
number_of_shards | The number of primary shards for the index. Default is1 , the upper limit for1024 . Only the indexCreate time Settings. |
|
number_of_routing_shards | Used to split (split ) Indicates the number of index route fragments |
|
shard.check_on_startup | Fragments are detected before opening. If damaged, opening is interrupted. The default value isfalse , the values are:true ,checksum ,false |
|
codec | LZ4 is used to compress and store data by defaultbest_compression Higher compression ratio, but lower storage field performance. |
|
routing_partition_size | User-defined number of route fragments that can be converted. The default value1 Must be less thannumber_of_shards (unless it is also 1), index onlyCreate time Settings. |
|
Soft index deletion, defaulttrue . Supported versions: [6.5.0 .7.6.0 ), the version earlier than the set does not have this configuration. The version greater than the set is not recommended |
||
soft_deletes.retention_lease.period | Retention period of fragment history before expiration. Default12h |
|
load_fixed_bitset_filters_eagerly | Whether to preload cache filters for nested queries, defaulttrue , the values are:true ,false |
|
Dynamic Index Setting | ||
hidden | Whether the index is hidden by default. If the index is hidden, the parameter must be added to the requestexpand_wildcards Can be queried. The default value isfalse , the values are:true ,false |
|
number_of_replicas | The number of copies (backups) for each master shard. The default value is1 |
|
auto_expand_replicas | Automatically scale the number of backups based on the number of data nodes in the cluster, set to a connected-separated lower and upper limit (e.g.,0 to 5 ) or useall As an upper limit (e.g.,0-all ). The default isfalse (that is, disabled). |
|
search.idle.after | Wait time before searching idle, default value30s |
|
refresh_interval | Refresh the operation cycle to make the index’s most recent changes searchable. The default value1s , can be set to- 1 Disable refresh |
|
max_result_window | The maximum value of data searched from the index (i.efrom + size ), the default value10000 |
|
max_inner_result_window | Index internal hit and maximum hit aggregation (i.efrom + size ), the default value100 |
|
max_rescore_window | Search index re-scoring (rescore ) when requestedwindow_size Maximum value (default vsmax_result_window As the default10000 ) |
|
max_docvalue_fields_search | Allow the querydocvalue_fields Maximum value, default value100 |
|
max_script_fields | Allow the queryscript_fields Maximum value, default value32 |
|
max_ngram_diff | Used forNGramTokenizer andNGramTokenFilter The,min_gram andmax_gram Maximum difference between, default value1 |
|
max_shingle_diff | Used forshingle token filter The,max_shingle_size andmin_shingle_size Maximum difference between, default value3 |
|
max_refresh_listeners | Maximum number of refresh listeners per index shard. | |
analyze.max_token_count | The maximum tag value for the _analyze API, default10000 |
|
highlight.max_analyzed_offset | The maximum number of parsed characters used to highlight requests, only for noneoffsets andterm vectors The text highlighting request is valid. The default value1000000 |
|
max_terms_count | Maximum number of words used for word query. Default value65536 |
|
max_regex_length | The maximum length of a re used for a re query, default1000 |
|
routing.allocation.enable | Controls the sharding allocation of indexes. The default valueall , the values are:all ,primaries ,new_primaries ,none |
|
routing.rebalance.enable | Allows index fragment rebalancing, defaultall , the values are:all ,primaries ,replicas ,none |
|
gc_deletes | The number of versions of deleted documents that can be used for further versioning, the default60s |
|
default_pipeline | The default ingest node pipeline for an index. Usable parameterspipeline Overloading. Specific parameter_none Indicates that the intake node pipe will not run. |
|
final_pipeline | An ingest node pipeline for an index. Specific parameter_none Indicates that the intake node pipe will not run. |
With so many Settings, it is not normally configured. Number_of_shards (the number of primary shards) and number_of_replicas (the number of replicas of the primary shards) are commonly used.
Create a sample
1. Create an index and configure number_of_SHards and number_of_replicas. The master shard is set to 3, and the master shard’s copy is set to 2. The index name is displayed in the path and the following JSON is the request body content. The index name must comply with the following rules:
The index name must comply with the following conventions:
-
The value can only be lowercase characters
-
Cannot contain the characters \, /, *,? , “, <, >, |, (space), and #
-
Indexes can contain colons (:) prior to 7.0, but not recommended after.
-
Cannot start with -, _, or +
-
Cannot be. Or..
-
The length cannot exceed 255 bytes (note bytes, so multi-byte characters will reach the 255 limit faster)
-
Names starting with. Are not recommended, except for hidden and internal indexes managed by plug-ins
PUT /my-index-000001
{
"settings": {
"index": {
"number_of_shards": 3."number_of_replicas": 2}}}Copy the code
Further, instead of displaying the specified index section:
PUT /my-index-000001
{
"settings": {
"number_of_shards": 3."number_of_replicas": 2}}Copy the code
- Plus the mapping configuration
PUT /test
{
"settings": {
"number_of_shards": 1}."mappings": {
"properties": {
"field1": { "type": "text"}}}}Copy the code
- Configuring index Aliases
{
"aliases": {
"alias_1": {},
"alias_2": {
"filter": {
"term": { "user.id": "kimchy"}},"routing": "shard-1"}}}Copy the code
Sample response:
{
"acknowledged": true."shards_acknowledged": true."index": "test"
}
Copy the code
Trees indicates whether the index was successfully created in the cluster, and shards_trees indicates whether the required number of shard copies was started for each shard before timeout.
It is important to note that both values can be false, but the index can also be created successfully, indicating only the state of the operation before the timeout.
Index Management API
Further, the API related to index management is as follows:
The API documentation | Http method | URL | The path parameter | Query parameters | Request body |
---|---|---|---|---|---|
Create indexes | PUT | /{index} | {index} The index name |
include_type_name wait_for_active_shards master_timeout timeout |
aliases mappings |
Remove the index | DELETE | /{index} | {index} The index name |
allow_no_indices expand_wildcards ignore_unavailable master_timeout timeout |
There is no |
To obtain the index | GET | /{index} | {index} The index name |
allow_no_indices expand_wildcards flat_settings include_defaults include_type_name ignore_unavailable local master_timeout |
There is no |
The index is | HEAD | /{index} | {index} Index names, which can be multiple names separated by commas |
allow_no_indices expand_wildcards flat_settings include_defaults include_type_name ignore_unavailable local |
There is no |
Close the index | POST | /{index}/_close | {index} Index names, which can be multiple names separated by commas |
allow_no_indices expand_wildcards ignore_unavailable wait_for_active_shards master_timeout timeout |
There is no |
Open the index | POST | /{index}/_open | {index} Index names, which can be multiple names separated by commas. Also can use_all 或 * On behalf of all |
allow_no_indices expand_wildcards ignore_unavailable wait_for_active_shards master_timeout timeout |
There is no |
Shrinkage index | POST 或 PUT | /{index}/_shrink/{target_index} | {index} Source index name{target_index} Target index name |
wait_for_active_shards master_timeout timeout |
aliases settings max_primary_shard_size |
Break up the index | POST 或 PUT | /{index}/_split/{target-index} | {index} Source index name{target_index} Target index name |
wait_for_active_shards master_timeout timeout |
aliases settings |
Copy the index | POST 或 PUT | /{index}/_clone/{target-index} | {index} Source index name{target_index} Target index name |
wait_for_active_shards master_timeout timeout |
aliases settings |
Flip the index | POST | / {rollover – target} / _rollover or / {rollover – target} / _rollover / {target – index} | {rollover-target} The name of the index to be flipped.{target-index} Optional target index name |
dry_run include_type_name wait_for_active_shards master_timeout timeout |
aliases conditions mappings settings |
Freezing index | POST | /{index}/_freeze | {index} The index name |
There is no | There is no |
Thawing index | POST | /{index}/_unfreeze | {index} The index name |
There is no | There is no |
Analytical index | GET | /_resolve/index/{name} | {name} Index names, which can be multiple names separated by commas |
expand_wildcards |
There is no |
Based on the data in the above table, we can further summarize as follows.
- The index of
CRUD
Operating as follows:
- Add: Creates an index
PUT /{index}
- Delete: Deletes an index
DELETE /{index}
- Search: get the index
GET /{index}
- Modified: there is no API to modify index directly, there is API to modify index attributes, and need to see whether the attributes are static or dynamic attributes, etc.
- Index existence operation:
- The index is
HEAD /{index}
Elasticsearch responds 200 if it has the index, 404 if it doesn’t.
- Index on and off operations:
- Shut down
POST /{index}/_close
- open
POST /{index}/_open
After an index is closed, it cannot be read or written. Switch status can be found at GET /_cat/indices/ Customer? V, the value is open or close
- Index freezing and unfreezing operations:
- freeze
POST /{index}/_freeze
- thaw
POST /{index}/_unfreeze
After the index is frozen, it cannot write. The frozen state can be viewed in GET /{index}, in {index}-> Settings ->index->frozen, with a value of false or true.
- Index shrink and expand operations:
- shrinkage
POST /{index}/_shrink/{target_index}
- Break up
POST /{index}/_split/{target_index}
Generate a new index and reduce or increase the number of shards for the index.
- Other operations:
- copy
POST /{index}/_clone/{target_index}
Copy to generate a new index.
- flip
POST / {rollover - target} / _rollover or / {rollover - target} / _rollover / {target - index}
When the specified condition is met, the write data of the index alias points to the new index to achieve the effect similar to data archiving.
- parsing
GET /_resolve/index/{name}
Gets the name, alias, and data flow of the related index with the specified name.
From the above summary, we can see that the API for index management can only operate directly on the index itself, and does not involve much more detailed things. Once again, Elasticsearch’s action keyword starts with an underscore _, such as _clone.
In the next article we will learn about index mapping management.
Welcome to my blog: The Border city of A-woo
Welcome to pay attention to my public number: A woo programming