This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Video course: Geek Time — Elasticsearch — GitHub
Series of articles:
Chapter 1: Overview of Elasticsearch Chapter 2: Getting started
Basic concepts: indexes, documents, and REST apis
The document
- ElasticSearch is document-oriented, and a document is the smallest unit of all searchable data
- Log entries in log files
- Details of a movie/details of a record
- A song/a PDF document on an MP3 player
- The document is serialized to JSON format and saved in Elasticsearch
- A JSON object consists of fields
- Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type)
- Each document has a Unique ID
- You can specify your own ID
- Or automatically generated by Elasticsearch
JSON document
- A document contains a series of fields. Similar to a record in a database table
- JSON document, flexible format, do not need to define the format
- The type of the field can be specified or automatically calculated by ElasticSearch
- Supports arrays/supports nesting
Metadata for the document
- Metadata, used to annotate relevant information about a document
- _index – The name of the index to which the document belongs
- _type – Name of the type to which the document belongs
- _id – Unique Id of a document
- _source – Raw Json data for the document
_all– Integrate all field contents into this field, deprecated (deprecated after version 7.0)- _version – Indicates the version of a document
- _score – Relevance score
The index
- Index – An Index is a container for documents, a combination of a class of documents
- Index represents the concept of logical space. Each Index has its own Mapping definition, which defines the field name and field type of the contained document
- Shard embodies the concept of physical space; The data in the index is scattered over the Shard
- Index Mapping and Setting
- Mapping defines the types of document fields
- Setting defines different data distributions
Different semantics of indexes
- N: Many different indexes can be created in a Single ElasticSearch cluster
- Verb: The process of saving a document to ElasticSearch also called indexing
- ES, the process of creating an inverted index
- Noun: a b-tree index, an inverted index
Type
- Prior to 7.0, multiple Types could be set for an Index
- Since 6.0, Type has been Deprecated. Since 7.0, only one Type – can be created for an index
_doc
Default Type - The field type a: www.cnblogs.com/candlia/p/1…
Abstraction and analogy
- The difference between traditional relational databases and ElasticSearch
- ElasticSearch – Schemalass/Correlation/High performance full text search
- RDMS – transactional/Join
RDBMS | ElasticSearch |
---|---|
Table | The Index (Type) |
Row | Document |
Column | Filed |
Schema | Mapping |
SQL | DSL |
REST Api – Easily invoked by a variety of languages
Some basic apis
- Indices
- Create Index
- PUT Movies
- Create Index
- View all indexes
- _cat/indices
Kibana index management and operation
http://localhost:5601/app/management/data/index_management/indices
View index information
- GET kibana_sample_data_ecommerce
{
"kibana_sample_data_ecommerce" : {
"aliases": {},"mappings" : {
"properties" : {
"category" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"}}},"currency" : {
"type" : "keyword"
},
"customer_birth_date" : {
"type" : "date"
},
"customer_first_name" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above" : 256}}},"customer_full_name" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above" : 256}}},... Some information.... is omitted"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"}}},"number_of_shards" : "1"."auto_expand_replicas" : "0-1"."provided_name" : "kibana_sample_data_ecommerce"."creation_date" : "1636645300804"."number_of_replicas" : "0"."uuid" : "3krdWclZQe66NVj1TmobYQ"."version" : {
"created" : "7150099"
}
}
}
}
}
Copy the code
View index document count
- GET kibana_sample_data_ecommerce/_count
{
"count" : 4675."_shards" : {
"total" : 1."successful" : 1."skipped" : 0."failed" : 0}}Copy the code
- Check out the top 10 documents to get a sense of document format
- POST kibana_sample_data_ecommerce/_search
{
"took" : 4."timed_out" : false."_shards" : {
"total" : 1."successful" : 1."skipped" : 0."failed" : 0
},
"hits" : {
"total" : {
"value" : 4675."relation" : "eq"
},
"max_score" : 1.0."hits": [{"_index" : "kibana_sample_data_ecommerce"."_type" : "_doc"."_id" : "mhapD30B2_nndi5vbKef"."_score" : 1.0."_source" : {
"category" : [
"Men's Clothing"]."currency" : "EUR"."customer_first_name" : "Eddie"."customer_full_name" : "Eddie Underwood"."customer_gender" : "MALE"."customer_id" : 38."customer_last_name" : "Underwood"."customer_phone" : ""."day_of_week" : "Monday"."day_of_week_i" : 0."email" : "[email protected]"."manufacturer" : [
"Elitelligence"."Oceanavigations"]."order_date" : "2021-11-22T09:28:48+00:00"."order_id" : 584677."products": [{"base_price" : 11.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Elitelligence"."tax_amount" : 0."product_id" : 6283."category" : "Men's Clothing"."sku" : "ZO0549605496"."taxless_price" : 11.99."unit_discount_amount" : 0."min_price" : 6.35."_id" : "sold_product_584677_6283"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Basic T-shirt - dark blue/white"."price" : 11.99."taxful_price" : 11.99."base_unit_price" : 11.99
},
{
"base_price" : 24.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Oceanavigations"."tax_amount" : 0."product_id" : 19400."category" : "Men's Clothing"."sku" : "ZO0299602996"."taxless_price" : 24.99."unit_discount_amount" : 0."min_price" : 11.75."_id" : "sold_product_584677_19400"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Sweatshirt - grey multicolor"."price" : 24.99."taxful_price" : 24.99."base_unit_price" : 24.99}]."sku" : [
"ZO0549605496"."ZO0299602996"]."taxful_total_price" : 36.98."taxless_total_price" : 36.98."total_quantity" : 2."total_unique_products" : 2."type" : "order"."user" : "eddie"."geoip" : {
"country_iso_code" : "EG"."location" : {
"lon" : 31.3."lat" : 30.1
},
"region_name" : "Cairo Governorate"."continent_name" : "Africa"."city_name" : "Cairo"
},
"event" : {
"dataset" : "sample_ecommerce"}}},... There are nine records left... ] }}Copy the code
Check the indices
- GET /_cat/indices/kibana*? v&s=index
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green Open KibanA_sample_datA_flights Pdqg7ifitu-ombgwnxfoug 1 0 13059 0 5.4 MB 5.4 MB Green Open KibanA_SAMple_datA_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MBCopy the code
View the index whose status is green
- GET /_cat/indices? v&health=green
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .geoip_databases EQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Green Open.kibanA_7.15.0_001 oUpr8MWaSSSz03le9LY7Pw 1 0 303 49 3.4 MB Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 208b 208B Green Open KiBANA_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green open.kibana - Event-log -7.15.0-000001 Y-1AdSuutzy39ACZblmpvq 1 0 2 0 11.9 KB 11.9 KB Green open. Apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208B Green open Kibana_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green open. async-search IXKJJTPCSRmg3LcrI3cpiQ 1 Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10135 1.7 MB 1.7 MB Green Open KiBANA_SAMple_DATA_FLIGHTS PDQG7Ifitu-OMBGWNxFOUg 1 0 13059 0 5.4 MB 5.4 MB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 2 0 13.7 KB 13.7 KBCopy the code
Sort by number of documents
- GET /_cat/indices? v&s=docs.count:desc
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open movies P54hyh_sr-6l1of05s7ihg 1 1 62424 0 7.3 MB 7.3 MB Green Open KibanA_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green Open KibanA_sample_data_flights PDQg7Ifitu-OMbGWNXFOUg 1 0 13059 0 5.4 MB 5.4 MB Green Open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB green open.kibana_7.15.0_001 OUpr8MWaSSSz03le9LY7Pw 1 0 303 52 3.4 MB 3.4 MB Green Open. geoip_databases eQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10188 1.6 MB 1.6 MB Green Open Kibana-event-log-7.15.0-000001 Y-1AdSuutZy39ACZBLMPvq 1 0 2 0 11.9 KB 11.9 KB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 20 13.7 KB 13.7 KB Green open. Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 0 208b 208B Green open .apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208b green open .async-search IXKJJTPCSRmg3LcrI3cpiQ 1 0 0 0 279.3 KB of 279.3 KBCopy the code
View the memory occupied by the index
- GET /_cat/indices? v&h=i,tm&s=tm:desc
i tm
kibana_sample_data_ecommerce 72kb
kibana_sample_data_logs 50kb
movies 39kb
kibana_sample_data_flights 35.4kb
.kibana_task_manager_715.. 0_001 29kb
.kibana_715.. 0_001 28.3kb
.geoip_databases 8.5kb
.kibana-event-log7.15. 0- 000001. 4.7kb
.tasks 4.4kb
.apm-custom-link 0b
.apm-agent-configuration 0b
.async-search 0b
Copy the code
Kibana_sample_data_ecommerce /_count POST kibana_sample_data_ecommerce/_search # _cat indices V&s =index # GET /_cat/indices? V&health =green # GET /_cat/indices? V&s =docs. Count :desc Pri&v & h = the health, the index, the pri, rep, docs. Count, mt # check index of memory GET / _cat/indices? v&h=i,tm&s=tm:descCopy the code
Basic concepts: nodes, clusters, shards and replicas
Distributed feature
- Benefits of ElasticSearch distributed architecture
- Horizontal storage capacity expansion
- Improves system availability. Some nodes stop services, but the services of the whole cluster are not affected
- ElasticSearch’s distributed architecture
- Different clusters are distinguished by different names, default name is “ElasticSearch”
- You can modify the cluster name in the configuration file or run the -e cluster.name=tyron command
- A cluster can have one or more nodes
node
- The node is an instance of ElasticSearch
- Essentially a Java process
- Multiple ElasticSearch processes can run on one machine, but it is generally recommended that only one Instance of ElasticSearch be run on one machine in production environments
- Each node has a name, which is specified in the configuration file or specified during startup. -e node.name=node1
- After each node is started, it is assigned a UID and stored in the data directory
Master- Eligible Nodes and Master Node
- When each node starts, there is a Master Eligible node by default
- You can set Node. master:false to disable node.master
- Master-eligible nodes can participate in the election process and become Master
- When each node starts, it elects itself as the Master node
- The cluster status is stored on each node. Only the Master node can modify the cluster status
- Cluster State, which maintains necessary information about a Cluster
- All node information
- All indexes and their associated Mapping and Setting information
- Fragmented routing information
- Any node can modify the information, resulting in data inconsistency
- Cluster State, which maintains necessary information about a Cluster
Data Node & Coordinating Node
- Data Node
- A Node that can store Data is called a Data Node. It is responsible for preserving shard data and plays a crucial role in data expansion.
- Coordinating Node
- Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results
- Each Node functions as a Coordinating Node by default
Other node types
- Hot & Warm Node
- Data nodes with different hardware configurations are used to implement the Hot & Warm architecture and reduce cluster deployment costs
- Machine Learning Node
- Run machine learning jobs for exception detection
- Tribe Node
- The Tribe Node connects to different ElasticSearch clusters and supports treating them as a single Cluster
Configuring the Node Type
- In a development environment, a node can host multiple Juese
- A production environment should have dedicated nodes.
The node type | Configuration parameters | The default value |
---|---|---|
master eligible | node.master | true |
data | node.data | true |
ingest | node.ingest | true |
coordinating only | There is no | Each node is a coordinating node by default. Set all other types to false |
machine learning | node.ml | True (need to enable x-pack) |
Primary Shard & Replica Shard
-
Master sharding, to solve the problem of data level expansion. With master sharding, data can be distributed across all nodes in the cluster
- A shard is a running instance of Lucene
- The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex
-
Copy, to solve the problem of high availability of data, shard a copy of the master shard
- The number of copies can be dynamically adjusted
- Increasing the number of copies can also improve the availability of the service to some extent (read and fetch throughput)
-
The sharded distribution of blogs indexes in a three-node cluster
- Consider: How does adding a node or increasing the number of master shards affect the system?
Sharding setup
- For sharding in the production environment, capacity planning needs to be organized in advance
- The number of fragments is too small. Procedure
- As a result, nodes cannot be added for horizontal capacity expansion
- The amount of data in a single fragment is too large, which causes data redistribution time
- If the number of shards is set to too large, starting from 7.0, the default main shard is set to 1, which solves the problem of over-sharding
- It affects the relevance scoring of search results and the accuracy of statistical results
- Excessive fragments on a single node waste resources and affect performance
- The number of fragments is too small. Procedure
Check the cluster health status
Cluster operations are illustrated after the Docker environment is installed.
Basic CRUD of documents with batch operations
CRUD | operation |
---|---|
Index | PUT my_index/_doc/1 {“user”:”mike”,”comment”:”You know,for search”} |
Create | PUT my_index/_create/1 {“user”:”mike”,”comment”:”You know,for search”} POST my_index/ — doc {“user”:”mike”,”comment”:”You know,for search”} |
Read | GET my_index/_doc/1 |
Update | POST my_index/_update/1 {“doc”:{“user”:”mike”,”comment”:”You know,for search”}} |
Delete | DELETE my_index/_doc/1 |
- The Type name is _doc by convention
- Create will fail if the ID already exists
- Index, if ID does not exist, create a new document, otherwise, delete the existing document first, then create a new document, the version will increase
- Update, the document must already exist, and the Update only makes incremental changes to the corresponding field
Create a document
- You can automatically generate a document Id or specify a document Id
- By calling “POST/Users /_doc”, the system automatically generates the Document Id
- When HTTP PUT user/_create/1 is used to create a file, _create is displayed in the URI. If the document with the SPECIFIED ID already exists, the operation fails
GET the document
- Find the document, return to HTTP 200
- Document meta information
- Version information: The Version number of a document with the same ID increases even after it is deleted
- The _source contains all the original information for the document by default
- Document meta information
- Unable to find document, return HTTP 404
The Index document
- Index differs from Create: if the document does not exist, the new document will be indexed, otherwise the existing document will be deleted, the new document will be indexed, version information +1
The Update document
- The Update method does not delete the original document, but updates the real data
- The Post method/Payload must be included in doc
ES operation
# create document_id POST users/_doc {"user":"Tyron"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"
}
{
"_index" : "users"."_type" : "_doc"."_id" : "zTFsH30BWlRHpTG9YJXm"."_version" : 1."result" : "created"."_shards" : {
"total" : 2."successful" : 1."failed" : 0
},
"_seq_no" : 1."_primary_term" : 1
}
# create documentIf the ID already exists, an error is reported: PUT Users /_doc/1? op_type=create {"user":"Milk"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"} # first execute {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."result" : "created"."_shards" : {
"total" : 2."successful" : 1."failed" : 0
},
"_seq_no" : 0."_primary_term" : 1} # execute the second time {"error" : {
"root_cause": [{"type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"}]."type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"
},
"status" : 409
}
#get document by id
GET users/_doc/1
{
"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."_seq_no" : 0."_primary_term" : 1."found" : true."_source" : {
"user" : "Milk"."poset_date" : "2021-10-10T 13:10:10"."message" : "trying out kibana"}} # put index deletes the original document and adds the new document, while version+1
PUT users/_doc/1
{
"user":"Tyron"
}
{
"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."result" : "updated"."_shards" : {
"total" : 2."successful" : 1."failed" : 0
},
"_seq_no" : 2."_primary_term" : 1} # get () {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."_seq_no" : 2."_primary_term" : 1."found" : true."_source" : {
"user" : "Tyron"POST users/_update/1/
{
"doc": {
"user":"POST"."poset_date": "2021-10-10T 13:10:10"."message":"users/_update/1"}} {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."result" : "updated"."_shards" : {
"total" : 2."successful" : 1."failed" : 0
},
"_seq_no" : 3."_primary_term" : 1} # add a field to the original document {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."_seq_no" : 3."_primary_term" : 1."found" : true."_source" : {
"user" : "POST"."message" : "users/_update/1"."poset_date" : "2021-10-10T 13:10:10"}}Copy the code
Bulk API
-
Supports operation on different indexes in a single API call
-
Four types of operations are supported
- Index
- Create
- Update
- Delete
-
You can specify an Index in the URI or in the Payload requested
-
The failure of a single operation does not affect other operations
-
The return result includes the result of each operation
Batch read – mget
- Batch operations can reduce the cost of network connections and improve performance
Batch query -msearch
Common error return
The problem | why |
---|---|
Unable to connect | The network is faulty or the cluster is down |
Connection cannot be closed | The network is faulty or the node is faulty |
429 | The Cluster is too Busy |
4XX | Request size error |
500 | Cluster internal error |