This is the sixth day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Video course: Geek Time — Elasticsearch — GitHub

Series of articles:

Chapter 1: Overview of Elasticsearch Chapter 2: Getting started

Basic concepts: indexes, documents, and REST apis

The document

  • ElasticSearch is document-oriented, and a document is the smallest unit of all searchable data
    • Log entries in log files
    • Details of a movie/details of a record
    • A song/a PDF document on an MP3 player
  • The document is serialized to JSON format and saved in Elasticsearch
    • A JSON object consists of fields
    • Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type)
  • Each document has a Unique ID
    • You can specify your own ID
    • Or automatically generated by Elasticsearch

JSON document

  • A document contains a series of fields. Similar to a record in a database table
  • JSON document, flexible format, do not need to define the format
    • The type of the field can be specified or automatically calculated by ElasticSearch
    • Supports arrays/supports nesting

Metadata for the document

  • Metadata, used to annotate relevant information about a document
    • _index – The name of the index to which the document belongs
    • _type – Name of the type to which the document belongs
    • _id – Unique Id of a document
    • _source – Raw Json data for the document
    • _all– Integrate all field contents into this field, deprecated (deprecated after version 7.0)
    • _version – Indicates the version of a document
    • _score – Relevance score

The index

  • Index – An Index is a container for documents, a combination of a class of documents
    • Index represents the concept of logical space. Each Index has its own Mapping definition, which defines the field name and field type of the contained document
    • Shard embodies the concept of physical space; The data in the index is scattered over the Shard
  • Index Mapping and Setting
    • Mapping defines the types of document fields
    • Setting defines different data distributions

Different semantics of indexes

  • N: Many different indexes can be created in a Single ElasticSearch cluster
  • Verb: The process of saving a document to ElasticSearch also called indexing
    • ES, the process of creating an inverted index
  • Noun: a b-tree index, an inverted index

Type

  • Prior to 7.0, multiple Types could be set for an Index
  • Since 6.0, Type has been Deprecated. Since 7.0, only one Type – can be created for an index_docDefault Type
  • The field type a: www.cnblogs.com/candlia/p/1…

Abstraction and analogy

  • The difference between traditional relational databases and ElasticSearch
    • ElasticSearch – Schemalass/Correlation/High performance full text search
    • RDMS – transactional/Join
RDBMS ElasticSearch
Table The Index (Type)
Row Document
Column Filed
Schema Mapping
SQL DSL

REST Api – Easily invoked by a variety of languages

Some basic apis

  • Indices
    • Create Index
      • PUT Movies
  • View all indexes
    • _cat/indices

Kibana index management and operation

http://localhost:5601/app/management/data/index_management/indices

View index information

  • GET kibana_sample_data_ecommerce
{
  "kibana_sample_data_ecommerce" : {
    "aliases": {},"mappings" : {
      "properties" : {
        "category" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"}}},"currency" : {
          "type" : "keyword"
        },
        "customer_birth_date" : {
          "type" : "date"
        },
        "customer_first_name" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above" : 256}}},"customer_full_name" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above" : 256}}},... Some information.... is omitted"settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"}}},"number_of_shards" : "1"."auto_expand_replicas" : "0-1"."provided_name" : "kibana_sample_data_ecommerce"."creation_date" : "1636645300804"."number_of_replicas" : "0"."uuid" : "3krdWclZQe66NVj1TmobYQ"."version" : {
          "created" : "7150099"
        }
      }
    }
  }
}

Copy the code

View index document count

  • GET kibana_sample_data_ecommerce/_count
{
  "count" : 4675."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0}}Copy the code
  • Check out the top 10 documents to get a sense of document format
    • POST kibana_sample_data_ecommerce/_search
{
  "took" : 4."timed_out" : false."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4675."relation" : "eq"
    },
    "max_score" : 1.0."hits": [{"_index" : "kibana_sample_data_ecommerce"."_type" : "_doc"."_id" : "mhapD30B2_nndi5vbKef"."_score" : 1.0."_source" : {
          "category" : [
            "Men's Clothing"]."currency" : "EUR"."customer_first_name" : "Eddie"."customer_full_name" : "Eddie Underwood"."customer_gender" : "MALE"."customer_id" : 38."customer_last_name" : "Underwood"."customer_phone" : ""."day_of_week" : "Monday"."day_of_week_i" : 0."email" : "[email protected]"."manufacturer" : [
            "Elitelligence"."Oceanavigations"]."order_date" : "2021-11-22T09:28:48+00:00"."order_id" : 584677."products": [{"base_price" : 11.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Elitelligence"."tax_amount" : 0."product_id" : 6283."category" : "Men's Clothing"."sku" : "ZO0549605496"."taxless_price" : 11.99."unit_discount_amount" : 0."min_price" : 6.35."_id" : "sold_product_584677_6283"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Basic T-shirt - dark blue/white"."price" : 11.99."taxful_price" : 11.99."base_unit_price" : 11.99
            },
            {
              "base_price" : 24.99."discount_percentage" : 0."quantity" : 1."manufacturer" : "Oceanavigations"."tax_amount" : 0."product_id" : 19400."category" : "Men's Clothing"."sku" : "ZO0299602996"."taxless_price" : 24.99."unit_discount_amount" : 0."min_price" : 11.75."_id" : "sold_product_584677_19400"."discount_amount" : 0."created_on" : "2016-12-26T09:28:48+00:00"."product_name" : "Sweatshirt - grey multicolor"."price" : 24.99."taxful_price" : 24.99."base_unit_price" : 24.99}]."sku" : [
            "ZO0549605496"."ZO0299602996"]."taxful_total_price" : 36.98."taxless_total_price" : 36.98."total_quantity" : 2."total_unique_products" : 2."type" : "order"."user" : "eddie"."geoip" : {
            "country_iso_code" : "EG"."location" : {
              "lon" : 31.3."lat" : 30.1
            },
            "region_name" : "Cairo Governorate"."continent_name" : "Africa"."city_name" : "Cairo"
          },
          "event" : {
            "dataset" : "sample_ecommerce"}}},... There are nine records left... ] }}Copy the code

Check the indices

  • GET /_cat/indices/kibana*? v&s=index
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green Open KibanA_sample_datA_flights Pdqg7ifitu-ombgwnxfoug 1 0 13059 0 5.4 MB 5.4 MB Green Open KibanA_SAMple_datA_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MBCopy the code

View the index whose status is green

  • GET /_cat/indices? v&health=green
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .geoip_databases EQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Green Open.kibanA_7.15.0_001 oUpr8MWaSSSz03le9LY7Pw 1 0 303 49 3.4 MB Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 208b 208B Green Open KiBANA_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB Green open.kibana - Event-log -7.15.0-000001 Y-1AdSuutzy39ACZblmpvq 1 0 2 0 11.9 KB 11.9 KB Green open. Apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208B Green open Kibana_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green open. async-search IXKJJTPCSRmg3LcrI3cpiQ 1 Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10135 1.7 MB 1.7 MB Green Open KiBANA_SAMple_DATA_FLIGHTS PDQG7Ifitu-OMBGWNxFOUg 1 0 13059 0 5.4 MB 5.4 MB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 2 0 13.7 KB 13.7 KBCopy the code

Sort by number of documents

  • GET /_cat/indices? v&s=docs.count:desc
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open movies P54hyh_sr-6l1of05s7ihg 1 1 62424 0 7.3 MB 7.3 MB Green Open KibanA_sample_data_logs wnDLe-UKThm_bxeEHttUMA 1 0 14074 0 8.8 MB 8.8 MB Green Open KibanA_sample_data_flights PDQg7Ifitu-OMbGWNXFOUg 1 0 13059 0 5.4 MB 5.4 MB Green Open Kibana_sample_data_ecommerce 3krdWclZQe66NVj1TmobYQ 1 0 4675 0 3.9 MB 3.9 MB green open.kibana_7.15.0_001 OUpr8MWaSSSz03le9LY7Pw 1 0 303 52 3.4 MB 3.4 MB Green Open. geoip_databases eQG671gRQ_iZVqTsQjCoXg 1 0 42 39 40.6 MB 40.6 MB Kibana_task_manager_7.15.0_001 jtovZQEQSey9HJD4xnaUqA 10 15 10188 1.6 MB 1.6 MB Green Open Kibana-event-log-7.15.0-000001 Y-1AdSuutZy39ACZBLMPvq 1 0 2 0 11.9 KB 11.9 KB Green open. Tasks VK0TmD2HTdSCjQsxmy_vUA 1 0 20 13.7 KB 13.7 KB Green open. Apm -custom-link FPjNyXGNTVqCoIRznmUH8Q 1 0 0 0 208b 208B Green open .apm-agent-configuration ExWGfjcESt6TR5zLtg0LJw 1 0 0 0 208b 208b green open .async-search IXKJJTPCSRmg3LcrI3cpiQ 1 0 0 0 279.3 KB of 279.3 KBCopy the code

View the memory occupied by the index

  • GET /_cat/indices? v&h=i,tm&s=tm:desc
i                                   tm
kibana_sample_data_ecommerce      72kb
kibana_sample_data_logs           50kb
movies                            39kb
kibana_sample_data_flights      35.4kb
.kibana_task_manager_715.. 0_001   29kb
.kibana_715.. 0_001              28.3kb
.geoip_databases                 8.5kb
.kibana-event-log7.15. 0- 000001.  4.7kb
.tasks                           4.4kb
.apm-custom-link                    0b
.apm-agent-configuration            0b
.async-search                       0b
Copy the code

Kibana_sample_data_ecommerce /_count POST kibana_sample_data_ecommerce/_search # _cat indices V&s =index # GET /_cat/indices? V&health =green # GET /_cat/indices? V&s =docs. Count :desc Pri&v & h = the health, the index, the pri, rep, docs. Count, mt # check index of memory GET / _cat/indices? v&h=i,tm&s=tm:descCopy the code

Basic concepts: nodes, clusters, shards and replicas

Distributed feature

  • Benefits of ElasticSearch distributed architecture
    • Horizontal storage capacity expansion
    • Improves system availability. Some nodes stop services, but the services of the whole cluster are not affected
  • ElasticSearch’s distributed architecture
    • Different clusters are distinguished by different names, default name is “ElasticSearch”
    • You can modify the cluster name in the configuration file or run the -e cluster.name=tyron command
    • A cluster can have one or more nodes

node

  • The node is an instance of ElasticSearch
    • Essentially a Java process
    • Multiple ElasticSearch processes can run on one machine, but it is generally recommended that only one Instance of ElasticSearch be run on one machine in production environments
  • Each node has a name, which is specified in the configuration file or specified during startup. -e node.name=node1
  • After each node is started, it is assigned a UID and stored in the data directory

Master- Eligible Nodes and Master Node

  • When each node starts, there is a Master Eligible node by default
    • You can set Node. master:false to disable node.master
  • Master-eligible nodes can participate in the election process and become Master
  • When each node starts, it elects itself as the Master node
  • The cluster status is stored on each node. Only the Master node can modify the cluster status
    • Cluster State, which maintains necessary information about a Cluster
      • All node information
      • All indexes and their associated Mapping and Setting information
      • Fragmented routing information
    • Any node can modify the information, resulting in data inconsistency

Data Node & Coordinating Node

  • Data Node
    • A Node that can store Data is called a Data Node. It is responsible for preserving shard data and plays a crucial role in data expansion.
  • Coordinating Node
    • Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results
    • Each Node functions as a Coordinating Node by default

Other node types

  • Hot & Warm Node
    • Data nodes with different hardware configurations are used to implement the Hot & Warm architecture and reduce cluster deployment costs
  • Machine Learning Node
    • Run machine learning jobs for exception detection
  • Tribe Node
    • The Tribe Node connects to different ElasticSearch clusters and supports treating them as a single Cluster

Configuring the Node Type

  • In a development environment, a node can host multiple Juese
  • A production environment should have dedicated nodes.
The node type Configuration parameters The default value
master eligible node.master true
data node.data true
ingest node.ingest true
coordinating only There is no Each node is a coordinating node by default. Set all other types to false
machine learning node.ml True (need to enable x-pack)

Primary Shard & Replica Shard

  • Master sharding, to solve the problem of data level expansion. With master sharding, data can be distributed across all nodes in the cluster

    • A shard is a running instance of Lucene
    • The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex
  • Copy, to solve the problem of high availability of data, shard a copy of the master shard

    • The number of copies can be dynamically adjusted
    • Increasing the number of copies can also improve the availability of the service to some extent (read and fetch throughput)
  • The sharded distribution of blogs indexes in a three-node cluster

    • Consider: How does adding a node or increasing the number of master shards affect the system?

Sharding setup

  • For sharding in the production environment, capacity planning needs to be organized in advance
    • The number of fragments is too small. Procedure
      • As a result, nodes cannot be added for horizontal capacity expansion
      • The amount of data in a single fragment is too large, which causes data redistribution time
    • If the number of shards is set to too large, starting from 7.0, the default main shard is set to 1, which solves the problem of over-sharding
      • It affects the relevance scoring of search results and the accuracy of statistical results
      • Excessive fragments on a single node waste resources and affect performance

Check the cluster health status

Cluster operations are illustrated after the Docker environment is installed.

Basic CRUD of documents with batch operations

CRUD operation
Index PUT my_index/_doc/1 {“user”:”mike”,”comment”:”You know,for search”}
Create PUT my_index/_create/1 {“user”:”mike”,”comment”:”You know,for search”}

POST my_index/ — doc {“user”:”mike”,”comment”:”You know,for search”}
Read GET my_index/_doc/1
Update POST my_index/_update/1 {“doc”:{“user”:”mike”,”comment”:”You know,for search”}}
Delete DELETE my_index/_doc/1
  • The Type name is _doc by convention
  • Create will fail if the ID already exists
  • Index, if ID does not exist, create a new document, otherwise, delete the existing document first, then create a new document, the version will increase
  • Update, the document must already exist, and the Update only makes incremental changes to the corresponding field

Create a document

  • You can automatically generate a document Id or specify a document Id
  • By calling “POST/Users /_doc”, the system automatically generates the Document Id
  • When HTTP PUT user/_create/1 is used to create a file, _create is displayed in the URI. If the document with the SPECIFIED ID already exists, the operation fails

GET the document

  • Find the document, return to HTTP 200
    • Document meta information
      • Version information: The Version number of a document with the same ID increases even after it is deleted
      • The _source contains all the original information for the document by default
  • Unable to find document, return HTTP 404

The Index document

  • Index differs from Create: if the document does not exist, the new document will be indexed, otherwise the existing document will be deleted, the new document will be indexed, version information +1

The Update document

  • The Update method does not delete the original document, but updates the real data
  • The Post method/Payload must be included in doc

ES operation

# create document_id POST users/_doc {"user":"Tyron"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"
}

{
  "_index" : "users"."_type" : "_doc"."_id" : "zTFsH30BWlRHpTG9YJXm"."_version" : 1."result" : "created"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 1."_primary_term" : 1
}

# create documentIf the ID already exists, an error is reported: PUT Users /_doc/1? op_type=create {"user":"Milk"."poset_date": "2021-10-10T 13:10:10"."message":"trying out kibana"} # first execute {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."result" : "created"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 0."_primary_term" : 1} # execute the second time {"error" : {
    "root_cause": [{"type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"}]."type" : "version_conflict_engine_exception"."reason" : "[1]: version conflict, document already exists (current version [1])"."index_uuid" : "vTw-Xm05TeycobYXlwN1eA"."shard" : "0"."index" : "users"
  },
  "status" : 409
}

#get document by id
GET users/_doc/1

{
  "_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 1."_seq_no" : 0."_primary_term" : 1."found" : true."_source" : {
    "user" : "Milk"."poset_date" : "2021-10-10T 13:10:10"."message" : "trying out kibana"}} # put index deletes the original document and adds the new document, while version+1
PUT users/_doc/1
{
  "user":"Tyron"
}
{
  "_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."result" : "updated"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 2."_primary_term" : 1} # get () {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 2."_seq_no" : 2."_primary_term" : 1."found" : true."_source" : {
    "user" : "Tyron"POST users/_update/1/
{
 "doc": {
    "user":"POST"."poset_date": "2021-10-10T 13:10:10"."message":"users/_update/1"}} {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."result" : "updated"."_shards" : {
    "total" : 2."successful" : 1."failed" : 0
  },
  "_seq_no" : 3."_primary_term" : 1} # add a field to the original document {"_index" : "users"."_type" : "_doc"."_id" : "1"."_version" : 3."_seq_no" : 3."_primary_term" : 1."found" : true."_source" : {
    "user" : "POST"."message" : "users/_update/1"."poset_date" : "2021-10-10T 13:10:10"}}Copy the code

Bulk API

  • Supports operation on different indexes in a single API call

  • Four types of operations are supported

    • Index
    • Create
    • Update
    • Delete
  • You can specify an Index in the URI or in the Payload requested

  • The failure of a single operation does not affect other operations

  • The return result includes the result of each operation

Batch read – mget

  • Batch operations can reduce the cost of network connections and improve performance

Batch query -msearch

Common error return

The problem why
Unable to connect The network is faulty or the cluster is down
Connection cannot be closed The network is faulty or the node is faulty
429 The Cluster is too Busy
4XX Request size error
500 Cluster internal error