Elasticsearch

Elasticsearch profile

Elasticsearch is a distributed, multi-tenant, full-text search engine based on the Lucene library.
Elasticsearch is implemented by Java development.
Elasticsearch provides a REST interface and JSON documentation that can be called by any programming language
Elasticsearch can search and analyze any type of data in near real time, whether structured, semi-structured, or unstructured. Elasticsearch is efficient for storage and fast searching.
High performance, high availability (data, services), horizontal scalability, easy to use
Different node types are supported

Application scenarios of ElasticSearch

Add search capabilities to your application or website
Store and analyze logs, indicators, and security event data
Use machine learning to automatically model the behavior of data in real time
Automate business workflow using Elasticsearch as the storage engine
Manage, integrate, and analyze spatial information using Elasticsearch as a geographic Information system (GIS)
Store and process genetic data using Elasticsearch as a bioinformatics research tool

Elasticsearch family

Elasticsearch ecosphere

Logstash and beatsFor data fetching
elasticsearchIt is used for data storage, analysis and calculation
kibanaFor data visualization
x-packMainly for commercial use, such as security, monitoring, alarm, graph query, machine learning, etc.

logstash

Introduction to the

Open source server-side data processing pipeline to collect data from different sources, transform data, and send data to different repositories.
Originally used for log collection and processing.

features

Parse and transform data in real time
extensible
- Support more than 200 plug-ins (logging, database, Arcsigh, Netflow)
Reliability, safety
- Logstash ensures that running events are delivered at least once through persistent queues
- Data transmission encryption
monitoring

kibana

Introduction to the

Kiwi fruit + Banana
Data visualization tools to help users solve any questions about data
Logstash based tools

Beats layer is responsible for collecting data and storing it directly to Elastic Search or to logstash for parsing and filtering and storing it to Elastic Search. Elasticsearch is a storage engine that provides APIS for searching data. Analyze data and other operations, Kibana and ElasticSearch visual interaction.

The basic concept of ElasticSearch

Document

Elasticsearch is document oriented, document is a basic unit of information that can be indexed
Document will be serialized to JSON format and saved in ElasticSearch
- A JSON object consists of fields
- Each field has a corresponding field type, such as string/numeric/Boolean/date/binary/range type
- Json document, flexible format, do not need to define the format
- The field type can be specified or automatically calculated by ElasticSearch
- Support data, support nested JSON
Each document has a unique ID
- You can specify your own ID
- Elasticsearch is automatically generated

Document MetaData

{
  "_index" : "movies"."_type" : "_doc"."_id" : "37475"."_score" : 1.0."_source" : {
    "genre" : [
      "Drama"]."id" : "37475"."year" : 2005."title" : "Unfinished Life, An"."@version" : "1"}}Copy the code

Document metaData is used to annotate information about documents

_index: indicates the index of the document
_type: indicates the type to which document belongs_doc
_id: document unique id
_source: JSON Document content
@version: document version
_score: correlation score

Index

Index is a container for document, a combination of a class of documents
Each index has its own mapping definition that defines the field name and field type of the contained document
The data in the index is scattered over the Shard. Setting can define different data distribution

// index settings
{
    "settings":
    {
        "index":
        {
            "creation_date": "1624690171977"."number_of_shards": "1"."number_of_replicas": "1"."uuid": "HqFyAwvOQ8Ctfwy7Cbwz-A"."version":
            {
                "created": "7090299"
            },
            "provided_name": "movies"}}}// index mapping
{
  "mappings": {
    "_doc": {
      "properties": {
        "@version": {
          "type": "text"."fields": {
            "keyword": {
              "type": "keyword"."ignore_above": 256}}},"genre": {
          "type": "text"."fields": {
            "keyword": {
              "type": "keyword"."ignore_above": 256}}},"id": {
          "type": "text"."fields": {
            "keyword": {
              "type": "keyword"."ignore_above": 256}}},"title": {
          "type": "text"."fields": {
            "keyword": {
              "type": "keyword"."ignore_above": 256}}},"year": {
          "type": "long"
        }
      }
    }
  }
}
Copy the code

Type

In an index, you can define one or more types. A type is a logical classification/partition of your index, and its semantics are entirely up to you. Typically, a type is defined for documents that have the same set of fields. For example, let’s say you run a blogging platform and store all your data in an index. In this index, you can define one type for user data, another type for blog data, and, of course, another type for comment data.

Note that:

Prior to 7.0, multiple types could be set for an index
Since 6.0, Type has been Deprecated. As of 7.0, only one Type – can be created for an index_doc
– 8.0 will be completely scrapped

Cluster

An ElasticSearch cluster is a cluster of one or more nodes that collectively hold all of your data and provide indexing and search functions.
A cluster is identified by a unique name, which defaults to “ElasticSearch”. Different clusters can be identified by different names. You can modify the names in the configuration file or add -e cluster.name=es_demo in the startup command line interface.
This ensures high availability of the system
- High availability of services: Node services can be stopped
- High availability of data: Data is not lost even if some nodes are lost
scalability
- Increasing volume of requests/data (distribute data across all nodes)
- Horizontal expansion, add nodes, nodes can be specified by the name of a cluster, to join the cluster
Cluster status:
- Green– Shard and Replica are allocated normally
- Yellow– Shard are all properly allocated, Replica is not properly allocated.
- Red– Some shards were not allocated properly, for example, a new index was created when the server disk capacity exceeded 85%.

#Viewing Cluster StatusThe curl -i http://192.168.0.41:9200/_cluster/healthCopy the code

{
    "cluster_name": "es_demo"."status": "yellow"."timed_out": false."number_of_nodes": 1."number_of_data_nodes": 1."active_primary_shards": 7."active_shards": 7."relocating_shards": 0."initializing_shards": 0."unassigned_shards": 1."delayed_unassigned_shards": 0."number_of_pending_tasks": 0."number_of_in_flight_fetch": 0."task_max_waiting_in_queue_millis": 0."active_shards_percent_as_number": 87.5
}
Copy the code

Node

A node is an instance of ElasticSearch, which is essentially a Java process. You can run multiple ElasticSearch processes on a server by modifying the port. However, in production environments, you are advised to run only one Instance of ElasticSearch on a machine.
Each node is given a name, either through a configuration file or on the command line at startup-E node.name=node1The specified
After each node is started, it is assigned a unique ID and stored in the data directory.
There are many types of nodes, and different types of nodes have different functions.
- master node & master-eligible node
- data node & coordinating node
- Hot & warm node
- machine learning node
- tribe node
A single node can play multiple roles in a development environment, but a single dedicated node should be used in a production environment.

The node type	Configuration parameters	The default value
master-eligible node	node.master	true
data node	node.data	true
ingest	node.ingest	true
coordinating only	There is no	Each node is a coordinating node by default. Set all other types to false.
machine learning	node.ml	true(enable x-pack)

Master Node & Master-eligible Nodes

When each node is started, it defaults to one Master Eligible node. This can be set through a configuration filenode.master: falseIs prohibited.
Master Eligible Node You can join the main selection process to become a Master node.
When the first node starts, it elects itself as a Master node.
The cluster status is stored on each node. Only the master node can change the cluster status
- Cluster state, which maintains the necessary information in a cluster
  - All node information
  - All indexes and their associated mapping and setting information
  - Fragmented routing information
- Any node can modify the information, resulting in data inconsistency

Data Node & Coordinating Node

A node that can store data is called a data node. It is responsible for preserving shard data and plays a crucial role in data expansion.
Coordinating Node is responsible for accepting client requests, distributing them to the appropriate nodes, and finally bringing the results together.
Each node functions as a coordinating node by default.

Hot & Warm Node

Data nodes with different hardware configurations can be used to implement the Hot & Warm architecture, reducing the cost of cluster deployment.

Machine Learning Node

Run ML jobs for exception detection.

Tribe Node

The Tribe Node is connected to different ES clusters and can be treated as an independent cluster.

Shard & Replica

Shard, used to solve the problem of data scaling horizontally. Sharding allows you to distribute data to all nodes in a cluster.
- A shard is a running instance of Lucene
- The number of fragments is passed when the index is creatednumber_of_shardsSpecifies that subsequent changes are not allowed, except for reindex
Replica, used to solve the problem of high availability of data, is a fragmented copy.
- Replica number can be dynamically adjusted throughnumber_of_replicasThe specified
- Adding Replica can also improve the high availability of the service to a certain extent.
Capacity planning is required for sharding in the production environment
- The number of fragments is too small. Procedure
  - Nodes cannot be added to achieve horizontal scaling
  - The amount of data in a single fragment is too large, which causes data redistribution time.
- If the number of fragments is set too large, the default main fragment is set to 1, which solves the problem of over-sharding
  - It affects the relevance scoring of search results and the accuracy of statistical results.
  - Excessive fragments on a single node waste resources and affect performance.

elasticsearch vs rdbms

RDBMS	ElasticSearch
Table	Index
Row	Document
Column	Field
Schema	Mapping
SQL	DSL

Elasticsearch provides high performance full text search with no transaction support and no JOIN support

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elasticsearch – Basic concepts for elasticSearch

ElasticSearch

Elasticsearch profile

Application scenarios of ElasticSearch

Elasticsearch family

Elasticsearch ecosphere

logstash

Introduction to the

features

kibana

Introduction to the

The basic concept of ElasticSearch

Document

Document MetaData

Index

Type

Cluster

Node

Master Node & Master-eligible Nodes

Data Node & Coordinating Node

Hot & Warm Node

Machine Learning Node

Tribe Node

Shard & Replica

elasticsearch vs rdbms

Elasticsearch – Basic concepts for elasticSearch

ElasticSearch

Elasticsearch profile

Application scenarios of ElasticSearch

Elasticsearch family

Elasticsearch ecosphere

logstash

Introduction to the

features

kibana

Introduction to the

The basic concept of ElasticSearch

Document

Document MetaData

Index

Type

Cluster

Node

Master Node & Master-eligible Nodes

Data Node & Coordinating Node

Hot & Warm Node

Machine Learning Node

Tribe Node

Shard & Replica

elasticsearch vs rdbms

Related Posts

Authentication and authorization scheme of WebSocket

Common Linux log statistics and analysis commands

3. JVM EMA expectation algorithm and TLAB related JVM startup parameters