Chapter 3: Getting Started (1)

This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Video course: Geek Time — Elasticsearch — GitHub

Series of articles:

Chapter 1: Overview of Elasticsearch Chapter 2: Getting started

Basic concepts: indexes, documents, and REST apis

The document

ElasticSearch is document-oriented, and a document is the smallest unit of all searchable data
- Log entries in log files
- Details of a movie/details of a record
- A song/a PDF document on an MP3 player
The document is serialized to JSON format and saved in Elasticsearch
- A JSON object consists of fields
- Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type)
Each document has a Unique ID
- You can specify your own ID
- Or automatically generated by Elasticsearch

JSON document

A document contains a series of fields. Similar to a record in a database table
JSON document, flexible format, do not need to define the format
- The type of the field can be specified or automatically calculated by ElasticSearch
- Supports arrays/supports nesting

Metadata for the document

Metadata, used to annotate relevant information about a document
- _index – The name of the index to which the document belongs
- _type – Name of the type to which the document belongs
- _id – Unique Id of a document
- _source – Raw Json data for the document
- ~~_all~~– Integrate all field contents into this field, deprecated (deprecated after version 7.0)
- _version – Indicates the version of a document
- _score – Relevance score

The index

Index – An Index is a container for documents, a combination of a class of documents
- Index represents the concept of logical space. Each Index has its own Mapping definition, which defines the field name and field type of the contained document
- Shard embodies the concept of physical space; The data in the index is scattered over the Shard
Index Mapping and Setting
- Mapping defines the types of document fields
- Setting defines different data distributions

Different semantics of indexes

N: Many different indexes can be created in a Single ElasticSearch cluster
Verb: The process of saving a document to ElasticSearch also called indexing
- ES, the process of creating an inverted index
Noun: a b-tree index, an inverted index

Type

Prior to 7.0, multiple Types could be set for an Index
Since 6.0, Type has been Deprecated. Since 7.0, only one Type – can be created for an index_docDefault Type
The field type a: www.cnblogs.com/candlia/p/1…

Abstraction and analogy

The difference between traditional relational databases and ElasticSearch
- ElasticSearch – Schemalass/Correlation/High performance full text search
- RDMS – transactional/Join

RDBMS	ElasticSearch
Table	The Index (Type)
Row	Document
Column	Filed
Schema	Mapping
SQL	DSL

REST Api – Easily invoked by a variety of languages

Some basic apis

Indices
- Create Index
  - PUT Movies
View all indexes
- _cat/indices

Kibana index management and operation

http://localhost:5601/app/management/data/index_management/indices

View index information

GET kibana_sample_data_ecommerce

{
  "kibana_sample_data_ecommerce" : {
    "aliases": {},"mappings" : {
      "properties" : {
        "category" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"}}},"currency" : {
          "type" : "keyword"
        },
        "customer_birth_date" : {
          "type" : "date"
        },
        "customer_first_name" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above" : 256}}},"customer_full_name" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above" : 256}}},... Some information.... is omitted"settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"}}},"number_of_shards" : "1"."auto_expand_replicas" : "0-1"."provided_name" : "kibana_sample_data_ecommerce"."creation_date" : "1636645300804"."number_of_replicas" : "0"."uuid" : "3krdWclZQe66NVj1TmobYQ"."version" : {
          "created" : "7150099"
        }
      }
    }
  }
}

Copy the code

View index document count

GET kibana_sample_data_ecommerce/_count

{
  "count" : 4675."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0}}Copy the code

Check out the top 10 documents to get a sense of document format
- POST kibana_sample_data_ecommerce/_search

{
  "took" : 4."timed_out" : false."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4675."relation" : "eq"
    },
    "max_score" : 1.0."hits": [{"_index" : "kibana_sample_data_ecommerce"."_type" : "_doc"."_id" : "mhapD30B2_nndi5vbKef"."_score" : 1.0."_source" : {
          "category" : [
            "Men's Clothing"]."currency" : "EUR"."customer_first_name" : "Eddie"."customer_full_name" : "Eddie Underwood"."customer_gender" : "MALE"."customer_id" : 38."customer_last_name" : "Underwood"."customer_phone" : ""."day_of_week" : "Monday"."day_of_week_i" : 0."email" : "[email protected]"."manufacturer" : [
            "Elitelligence"."Oceanavigations"]."order_date" : "2021-11-22T09:28:48+00:00"."order_id" : 584677."products": [{"base_price" : 11.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Elitelligence"."tax_amount" : 0."product_id" : 6283."category" : "Men's Clothing"."sku" : "ZO0549605496"."taxless_price" : 11.99."unit_discount_amount" : 0."min_price" : 6.35."_id" : "sold_product_584677_6283"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Basic T-shirt - dark blue/white"."price" : 11.99."taxful_price" : 11.99."base_unit_price" : 11.99
            },
            {
              "base_price" : 24.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Oceanavigations"."tax_amount" : 0."product_id" : 19400."category" : "Men's Clothing"."sku" : "ZO0299602996"."taxless_price" : 24.99."unit_discount_amount" : 0."min_price" : 11.75."_id" : "sold_product_584677_19400"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Sweatshirt - grey multicolor"."price" : 24.99."taxful_price" : 24.99."base_unit_price" : 24.99}]."sku" : [
            "ZO0549605496"."ZO0299602996"]."taxful_total_price" : 36.98."taxless_total_price" : 36.98."total_quantity" : 2."total_unique_products" : 2."type" : "order"."user" : "eddie"."geoip" : {
            "country_iso_code" : "EG"."location" : {
              "lon" : 31.3."lat" : 30.1
            },
            "region_name" : "Cairo Governorate"."continent_name" : "Africa"."city_name" : "Cairo"
          },
          "event" : {
            "dataset" : "sample_ecommerce"}}},... There are nine records left... ] }}Copy the code

Check the indices

GET /_cat/indices/kibana*? v&s=index

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green Open KibanA_sample_datA_flights Pdqg7ifitu-ombgwnxfoug 1 0 13059 0 5.4 MB 5.4 MB Green Open KibanA_SAMple_datA_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MBCopy the code

View the index whose status is green

GET /_cat/indices? v&health=green

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .geoip_databases EQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Green Open.kibanA_7.15.0_001 oUpr8MWaSSSz03le9LY7Pw 1 0 303 49 3.4 MB Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 208b 208B Green Open KiBANA_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green open.kibana - Event-log -7.15.0-000001 Y-1AdSuutzy39ACZblmpvq 1 0 2 0 11.9 KB 11.9 KB Green open. Apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208B Green open Kibana_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green open. async-search IXKJJTPCSRmg3LcrI3cpiQ 1 Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10135 1.7 MB 1.7 MB Green Open KiBANA_SAMple_DATA_FLIGHTS PDQG7Ifitu-OMBGWNxFOUg 1 0 13059 0 5.4 MB 5.4 MB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 2 0 13.7 KB 13.7 KBCopy the code

Sort by number of documents

GET /_cat/indices? v&s=docs.count:desc

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open movies P54hyh_sr-6l1of05s7ihg 1 1 62424 0 7.3 MB 7.3 MB Green Open KibanA_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green Open KibanA_sample_data_flights PDQg7Ifitu-OMbGWNXFOUg 1 0 13059 0 5.4 MB 5.4 MB Green Open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB green open.kibana_7.15.0_001 OUpr8MWaSSSz03le9LY7Pw 1 0 303 52 3.4 MB 3.4 MB Green Open. geoip_databases eQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10188 1.6 MB 1.6 MB Green Open Kibana-event-log-7.15.0-000001 Y-1AdSuutZy39ACZBLMPvq 1 0 2 0 11.9 KB 11.9 KB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 20 13.7 KB 13.7 KB Green open. Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 0 208b 208B Green open .apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208b green open .async-search IXKJJTPCSRmg3LcrI3cpiQ 1 0 0 0 279.3 KB of 279.3 KBCopy the code

View the memory occupied by the index

GET /_cat/indices? v&h=i,tm&s=tm:desc

i                                   tm
kibana_sample_data_ecommerce      72kb
kibana_sample_data_logs           50kb
movies                            39kb
kibana_sample_data_flights      35.4kb
.kibana_task_manager_715.. 0_001   29kb
.kibana_715.. 0_001              28.3kb
.geoip_databases                 8.5kb
.kibana-event-log7.15. 0- 000001.  4.7kb
.tasks                           4.4kb
.apm-custom-link                    0b
.apm-agent-configuration            0b
.async-search                       0b
Copy the code

Kibana_sample_data_ecommerce /_count POST kibana_sample_data_ecommerce/_search # _cat indices V&s =index # GET /_cat/indices? V&health =green # GET /_cat/indices? V&s =docs. Count :desc Pri&v & h = the health, the index, the pri, rep, docs. Count, mt # check index of memory GET / _cat/indices? v&h=i,tm&s=tm:descCopy the code

Basic concepts: nodes, clusters, shards and replicas

Distributed feature

Benefits of ElasticSearch distributed architecture
- Horizontal storage capacity expansion
- Improves system availability. Some nodes stop services, but the services of the whole cluster are not affected
ElasticSearch’s distributed architecture
- Different clusters are distinguished by different names, default name is “ElasticSearch”
- You can modify the cluster name in the configuration file or run the -e cluster.name=tyron command
- A cluster can have one or more nodes

node

The node is an instance of ElasticSearch
- Essentially a Java process
- Multiple ElasticSearch processes can run on one machine, but it is generally recommended that only one Instance of ElasticSearch be run on one machine in production environments
Each node has a name, which is specified in the configuration file or specified during startup. -e node.name=node1
After each node is started, it is assigned a UID and stored in the data directory

Master- Eligible Nodes and Master Node

When each node starts, there is a Master Eligible node by default
- You can set Node. master:false to disable node.master
Master-eligible nodes can participate in the election process and become Master
When each node starts, it elects itself as the Master node
The cluster status is stored on each node. Only the Master node can modify the cluster status
- Cluster State, which maintains necessary information about a Cluster
  - All node information
  - All indexes and their associated Mapping and Setting information
  - Fragmented routing information
- Any node can modify the information, resulting in data inconsistency

Data Node & Coordinating Node

Data Node
- A Node that can store Data is called a Data Node. It is responsible for preserving shard data and plays a crucial role in data expansion.
Coordinating Node
- Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results
- Each Node functions as a Coordinating Node by default

Other node types

Hot & Warm Node
- Data nodes with different hardware configurations are used to implement the Hot & Warm architecture and reduce cluster deployment costs
Machine Learning Node
- Run machine learning jobs for exception detection
Tribe Node
- The Tribe Node connects to different ElasticSearch clusters and supports treating them as a single Cluster

Configuring the Node Type

In a development environment, a node can host multiple Juese
A production environment should have dedicated nodes.

The node type	Configuration parameters	The default value
master eligible	node.master	true
data	node.data	true
ingest	node.ingest	true
coordinating only	There is no	Each node is a coordinating node by default. Set all other types to false
machine learning	node.ml	True (need to enable x-pack)

Primary Shard & Replica Shard

Master sharding, to solve the problem of data level expansion. With master sharding, data can be distributed across all nodes in the cluster
- A shard is a running instance of Lucene
- The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex
Copy, to solve the problem of high availability of data, shard a copy of the master shard
- The number of copies can be dynamically adjusted
- Increasing the number of copies can also improve the availability of the service to some extent (read and fetch throughput)
The sharded distribution of blogs indexes in a three-node cluster
- Consider: How does adding a node or increasing the number of master shards affect the system?

Sharding setup

For sharding in the production environment, capacity planning needs to be organized in advance
- The number of fragments is too small. Procedure
  - As a result, nodes cannot be added for horizontal capacity expansion
  - The amount of data in a single fragment is too large, which causes data redistribution time
- If the number of shards is set to too large, starting from 7.0, the default main shard is set to 1, which solves the problem of over-sharding
  - It affects the relevance scoring of search results and the accuracy of statistical results
  - Excessive fragments on a single node waste resources and affect performance

Check the cluster health status

Cluster operations are illustrated after the Docker environment is installed.

Basic CRUD of documents with batch operations

CRUD	operation
Index	PUT my_index/_doc/1 {“user”:”mike”,”comment”:”You know,for search”}
Create	PUT my_index/_create/1 {“user”:”mike”,”comment”:”You know,for search”} POST my_index/ — doc {“user”:”mike”,”comment”:”You know,for search”}
Read	GET my_index/_doc/1
Update	POST my_index/_update/1 {“doc”:{“user”:”mike”,”comment”:”You know,for search”}}
Delete	DELETE my_index/_doc/1

The Type name is _doc by convention
Create will fail if the ID already exists
Index, if ID does not exist, create a new document, otherwise, delete the existing document first, then create a new document, the version will increase
Update, the document must already exist, and the Update only makes incremental changes to the corresponding field

Create a document

You can automatically generate a document Id or specify a document Id
By calling “POST/Users /_doc”, the system automatically generates the Document Id
When HTTP PUT user/_create/1 is used to create a file, _create is displayed in the URI. If the document with the SPECIFIED ID already exists, the operation fails

GET the document

Find the document, return to HTTP 200
- Document meta information
  - Version information: The Version number of a document with the same ID increases even after it is deleted
  - The _source contains all the original information for the document by default
Unable to find document, return HTTP 404

The Index document

Index differs from Create: if the document does not exist, the new document will be indexed, otherwise the existing document will be deleted, the new document will be indexed, version information +1

The Update document

The Update method does not delete the original document, but updates the real data
The Post method/Payload must be included in doc

ES operation

# create document_id POST users/_doc {"user":"Tyron"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"
}

{
  "_index" : "users"."_type" : "_doc"."_id" : "zTFsH30BWlRHpTG9YJXm"."_version" : 1."result" : "created"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 1."_primary_term" : 1
}

# create documentIf the ID already exists, an error is reported: PUT Users /_doc/1? op_type=create {"user":"Milk"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"} # first execute {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."result" : "created"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 0."_primary_term" : 1} # execute the second time {"error" : {
    "root_cause": [{"type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"}]."type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"
  },
  "status" : 409
}

#get document by id
GET users/_doc/1

{
  "_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."_seq_no" : 0."_primary_term" : 1."found" : true."_source" : {
    "user" : "Milk"."poset_date" : "2021-10-10T 13:10:10"."message" : "trying out kibana"}} # put index deletes the original document and adds the new document, while version+1
PUT users/_doc/1
{
  "user":"Tyron"
}
{
  "_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."result" : "updated"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 2."_primary_term" : 1} # get () {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."_seq_no" : 2."_primary_term" : 1."found" : true."_source" : {
    "user" : "Tyron"POST users/_update/1/
{
 "doc": {
    "user":"POST"."poset_date": "2021-10-10T 13:10:10"."message":"users/_update/1"}} {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."result" : "updated"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 3."_primary_term" : 1} # add a field to the original document {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."_seq_no" : 3."_primary_term" : 1."found" : true."_source" : {
    "user" : "POST"."message" : "users/_update/1"."poset_date" : "2021-10-10T 13:10:10"}}Copy the code

Bulk API

Supports operation on different indexes in a single API call
Four types of operations are supported
- Index
- Create
- Update
- Delete
You can specify an Index in the URI or in the Payload requested
The failure of a single operation does not affect other operations
The return result includes the result of each operation

Batch read – mget

Batch operations can reduce the cost of network connections and improve performance

Batch query -msearch

Common error return

The problem	why
Unable to connect	The network is faulty or the cluster is down
Connection cannot be closed	The network is faulty or the node is faulty
429	The Cluster is too Busy
4XX	Request size error
500	Cluster internal error

Basic concepts: indexes, documents, and REST apis

The document

JSON document

Metadata for the document

The index

Different semantics of indexes

Type

Abstraction and analogy

REST Api – Easily invoked by a variety of languages

Some basic apis

Kibana index management and operation

View index information

View index document count

Check the indices

View the index whose status is green

Sort by number of documents

View the memory occupied by the index

Basic concepts: nodes, clusters, shards and replicas

Distributed feature

node

Master- Eligible Nodes and Master Node

Data Node & Coordinating Node

Other node types

Configuring the Node Type

Primary Shard & Replica Shard

Sharding setup

Check the cluster health status

Basic CRUD of documents with batch operations

Create a document

GET the document

The Index document

The Update document

ES operation

Bulk API

Batch read – mget

Batch query -msearch

Common error return

Related Posts

[ShardingSphere] Springboot integration shardingJDBC + Mybatis to add, delete, change and check

Nginx cannot use Chinese URL

BoCloud was awarded the 2021 PaaS Innovation Leader in cloud computing