• Github project address

  • ELK is the initials of ElasticSearch, Logstash, and Kibana. It’s also called the Elastic Stack. ElasticSearch is a Lucene based, distributed, RESTful interactive near real-time search platform framework. ElasticSearch can be used as the basic support framework for big data search engines like Google and Baidu. ElasticSearch provides powerful search capabilities. Logstash is the central data flow engine, ELK from different target (documents/data storage/MQ) to collect data of different formats, support after filtered output to the different purpose of (file/MQ/redis/elasticsearch, kafka, etc.). Kibana allows ElasticSearch data to be displayed on a friendly page that provides real-time analysis

  • Many developers on the market can always refer to ELK as a log analysis architecture stack, but in fact ELK is not only suitable for log analysis, it can support any other data analysis and collection scenario, log analysis and collection is more representative, not unique

  • Collect cleaning data — search, store –Kibana

ElasticSearch

Introduction of Lucene

An overview of the

  • Lucene is a jar package for information retrieval. No search engines included! Index structure, tools for reading and writing indexes, sorting, search rules…… (Solr)
  • Java written, the goal is to add full-text retrieval functions for a variety of small and medium-sized applications

Relations with ElasticSearch

  • ElasticSearch is based on Lucene with some encapsulation and enhancements

ElasticSearch profile

An overview of the

  • ElasticSearch (ES for short) is an open source and highly extensible distributed full-text search engine that can store and retrieve data in near real time. It can scale to hundreds of servers and handle petabytes of data. Es is also developed in Java and uses Lucene as its core to implement all indexing and search functions, but it aims to make full-text search simple by hiding the complexity of Lucene with a simple RESTful API
  • ElasticSearch surpassed ElasticSearch in January 2016, according to DB Engines

Who is using

  • Wikipedia, full text search, highlighting, search recommendations (weight)
  • News website, similar to Sohu news, user behavior log (click, browse, favorites, comments) + social network data (relevant views on *** news), data analysis, to give the author of each news article, let them know the public feedback of his article (good article, hydrological, popular)
  • Stack Overflow (foreign Programmer Exception Discussion Forum)
  • Github, search hundreds of billions of lines of code
  • E-commerce site, search for goods
  • Log data analysis, LogStash collection, ES replication data analysis, ELK (ElasticSearch+ Logstash +Kibana)
  • A commodity price monitoring site where users set a price threshold for a commodity and send a notification message when the price is lower than the threshold
  • BI systems, Business Intelligence, Business Intelligence. For example, in a large shopping mall, BI analyzes the trend of user consumption amount in a certain area in recent three years and the composition structure of user groups, produces relevant data reports, ES performs data analysis and mining, and KiBANA performs data visualization
  • Domestic: site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, etc.), data analysis (a popular use scenario of ES)

Solr and ES

ElasticSearch profile
  • ElasticSearch is a real-time distributed search and analysis engine. It makes it possible for you to process big data faster than ever before
  • It is used forFull-text search,Structured search,Analysis of theAnd a mix of all three:
    • Wikipedia uses ElasticSearch to provide full-text searches and highlight keywords, as well as Search suggestions such as search-asyou-type and did-you-mean
    • The Guardian uses ElasticSearch in combination with user logs and social network data to provide their editors with real-time feedback to see how the public is responding to new posts
    • Stack Overflow combines full-text search with geolocation queries and more-like-this functionality to find relevant questions and answers
    • Github uses ElasticSearch to retrieve 130 billion lines of code
  • ElasticSearch is an open source search engine based on Apache Lucene (TM). Lucene is arguably the most advanced, high-performance, and full-featured search engine library to date, both open source and proprietary
    • But Lucene is just a library, and to use it, you have to use Java code as the development language and integrate it directly into your application. What’s worse, Lucene is so complex that you need to know a lot about retrieval to understand how it works
    • ElasticSearch is also developed in Java and uses Lucene as its core for all indexing and search functions, but it aims to hide the complexity of Lucene with a simple RESTful API to make full text search easy
Solr profile
  • Solr is a top open source project under Apache, developed in Java. It is a full-text search server based on Lucene. Solr provides a richer query language than Lucene, and at the same time realizes configurable, extensible, and optimized index and search performance
  • Solr can run independently in Servlet containers such as Jetty and Tomcat. Solr index is realized simply by sending an XML document describing the Field and its contents to Solr server using POST method. Solr adds, deletes and updates indexes according to the XML document. Solr searches only need to send HTTP GET requests, and then organize the page layout by parsing the query results returned by Solr in XML or JSON formats. Solr does not provide the UI building function. Solr provides a management page on which you can query the configuration and running status of Solr
  • Solr develops enterprise-class search servers based on Lucene, essentially encapsulating Lucene
  • Solr is an independent enterprise-level search application server. It provides an API interface similar to Web Service. Users can submit files in a certain format to search engine servers through HTTP requests and generate indexes. You can also make a lookup request and get the result back
ElasricSearch vs. Solr
  • Solr is faster when simply searching through existing data
  • When indexes are created in real time, Solr causes I/O congestion, resulting in poor query performance. ElasticSearch has an obvious advantage
  • Solr becomes less efficient as the amount of data increases, while ElasticSearch doesn’t change significantly
  • By converting our search infrastructure from Solr ElasticSearch, you can find ~50 times better search performance
Summary of ElasticSearch vs Solr
  • Es is basically out of the box, very simple. Solr installation is a little more complicated
  • Solr uses Zookeeper for distributed management, while ElasticSearch provides distributed coordination management
  • Solr supports more data formats such as JSON, XML, and CSV, whereas ElasticSearch only supports JSON files
  • Solr offers a lot of features, but ElasticSearch itself focuses on core features. Advanced features are provided by third-party add-ons, such as the Kibana graphical interface
  • Solr is faster to query, but slower to update indexes (that is, slow to insert and delete), which is used in e-commerce applications with many queries
    • ES is fast in index building (slow in query) and real-time query. It is used for Facebook, Sina and other searches
    • Solr is a great solution for traditional search applications, but ElasticSearch is better suited for emerging real-time search applications
  • Solr is mature and has a much larger and more mature community of users, developers and contributors, whereas ElasticSearch is less developed and maintained, updates are too fast and costs more to learn and use

Inverted index (*)

  • Traditional search forward index full text search: inverted index

  • Each entry in such an index table contains an attribute value and the address of the records that have that attribute value. May I have an Inverted index, because it is not the records that determine the attribute values, but the attribute values that determine the position of the records?

  • Inverted indexes have two different forms of inverted indexes:

    • A horizontal reverse index (or reverse archive index) of a record contains a list of documents for each reference word
    • A horizontal reverse index (or full reverse index) of a word contains the position of each word in a document
  • As shown in the following example:

    The inverted index classifies the contents of the above documents with keywords, which can be used to directly locate the contents of the documents

ElasticSearch installation

  • Declaration: JDK1.8, minimum requirements! ElasticSearch client interface tool
  • Java development, ElasticSearch version and we after the corresponding Java core JAR package! Version corresponding JDK environment is normal

download

  • Official website download address

Install the ES

Windows environment
  • Unpack the

  • Directory file

    • Bin Startup file
    • Config Configuration file
      • Log4j2 Log configuration file
      • JVM Options Java VM configuration (1 GB memory by default)
      • Elasticsearch. Yml ElasticSearch configuration file (default port 9200 etc)
    • Lib Related JAR package
    • Logs log
    • Modules Function module
    • Plugins (*)
Linux environment
  • Gz Decompress the installation package

    tar -zxvf ***.tar.gz
    Copy the code
  • By default, ES does not support IP access. Modify elasticSearch. yml in config

    network.host: 192.16883.133.
    cluster.initial_master_nodes: ["node-1"."node-2"]
    Copy the code
  • The installation package boot mode requires additional configuration parameters

    • Modify limits on file handles

      ## modify restriction
      sudo vi /etc/sysctl.conf
      ## Check whether it takes effect
      sudo sysctl -p
      Copy the code

    • If the maximum number of open files for each process is too small, change the size of the open files

      sudo vi /etc/security/limits.conf
      Copy the code

      Add content

       *               soft    nproc           4096
       *               hard    nproc           4096
       *               soft    nofile          65536
       *               hard    nofile          65536
      Copy the code
      ## Check the soft limit size with the command
      ulimit -Sn 
      #Check the hard limit size with the command
      ulimit -Hn
      Copy the code
    • Restart the PC and restart ElasticSearch

Start the ES

  • Double-click ElasticSearch. bat to start ElasticSearch

  • The default exposed port is 9200

  • Access the browser 127.0.0.1:9200

Install the visual interface Head

Download address
  • The Node environment is required

  • Head download address

Compile operation

  • A cross-domain problem occurred while accessing 9100, causing a failure to connect to 9200

  • Add the following configuration for elasticSearch. yml

    http.cors.enabled: true
    http.cors.allow-origin: "*"
    Copy the code
    • Restart the ElasticSearch
  • For starters, you can think of ES as a database. You can create indexes (tables), documents (data in tables).

Head sees it as a data presentation tool. All subsequent queries are done in Kibana

Kibana

Kibana profile

  • Kibana is an open source analysis and visualization platform for ElasticSearch that allows you to search and view interactive data stored in the ElasticSearch index. With Kibana, you can perform advanced data analysis and presentation through various charts. Kibana makes massive amounts of data easier to understand with a simple, browser-based user interface that allows you to quickly create dashboards that display Elasticsearch queries in real time. Setting up Kibana is very simple. You can install Kibana and start ElasticSearch index monitoring in minutes without coding or additional infrastructure

Kibana installation

download

  • Kibana download address

The installation

Windows environment
  • Unpack the

  • Is a standard engineering bin/kibana.bat

Linux environment
  • Unzip kibana 7.6.1 – Linux – x86_64. Tar. Gz

  • Modify vim Kibana.yml

  • CD/usr/local/elk/kibana - 7.6.1 - Linux - x86_64 / bin /#Start the
    ./kibana --allow-root
    Copy the code

Start the Kibana

  • Bin/kibana. Bat double-click

  • Access test http://localhost:5601

  • The development tools

    • PostMan

    • curl

    • head

    • Google Chrome plug-in test (support Chinese)

      All subsequent operations are performed here

ES Core Concepts

An overview of the

  • The above content already knows what ES is, and the service of ES has been installed and started, so how does ES store data, what is the data structure, and how to realize search?

The concept of ES

  • The cluster

  • node

  • shard

    • How do nodes and sharding work

      • A cluster has at least one node, and a node is an ES process. A node can have multiple indexes. By default, if you create an index, the index will have 5 primary shards, and each primary shard will have a replica.

      • The figure above shows a cluster with three nodes. You can see that the master shard and the corresponding replication shard are not in the same node, so that even if a node fails, data will not be lost. In effect, a shard is a Lucene index, a directory of files with inverted indexes that are structured so that ES can tell you which documents contain a particular keyword without scanning the entire document

      • Inverted index

        Es uses a structure called an inverted index, with the Lucene inverted index as the underlying layer. This structure is suitable for fast full-text searches, where an index consists of all non-repeating lists in a document and, for each word, a list of documents containing it. For example, there are now two documents, each containing the following:

        Study every Day, Good Good Up to forever # Study every Day, Good Good Up #Copy the code

        To create an inverted index, divide each document into individual words (or terms or tokens) and then create a sorted list of all non-repeating terms. Then list which document each word appears in:

        Now we try to search for to Forever by looking at the document that contains each term

      • For another example, if we search for blog posts by blog tags, the inverted index list would look like this:

        • · If you want to search for articles with Python tags, it will be much faster to find the data in the inverted index than to find all the raw data. Just look at the tag column and get the relevant article ID. Completely filter out all irrelevant data to improve efficiency
      • Elasticsearch index vs. Lucene index

        • In ElasticSearch, the word index (library) is used a lot, this is how the term is used. In ElasticSearch, indexes are split into shards, each of which is a Lucene index. So an ElasticSearch index is made up of multiple Lucene indexes
  • The index

    • That’s the database
    • An index is a container of mapping types, and an index in ES is a very large collection of documents. The index stores fields and other Settings for the mapping type. They are then stored on the individual shards
  • type

    • A type is a logical container for a document. Like a relational database, a table is a container for rows, and the definitions of fields within a type are called mappings, such as name mapping to a string type
  • The document

    • To say that ES is document-oriented means that the smallest unit of index and search data is a document. In ES, documents have several important properties:
      • Self-contained, a document that contains both fields and corresponding values, i.e., key: value!
      • It can be hierarchical, with a document containing its own document, which is where complex logical entities come from
      • Flexible structure, documents do not rely on pre-defined schema, in a relational database requires pre-defined fields to use, in ES, for the field is very flexible, sometimes you can ignore the field, or dynamically add a new field
    • Although we can add or omit fields at will, each field type is important. For example, an age field type can be either a string or an integer. Because ES keeps the mapping between fields and types and other Settings, the mapping is specific to each type of each map, which is why in ES, types are sometimes called mapping types, right
  • mapping

MySQL vs. ElasticSearch

Elasticsearch is a document-oriented, relational database compared to ElasticSearch

MySQL ElasticSearch
Database Indices
Tables (tables) types
Lines (rows) document
Columns field

Elasticsearch can have multiple indexes (databases), each index can have multiple types (tables), each type can have multiple documents (rows), and each document can have multiple fields (columns).

Physical design

  • Elasticsearch splits each index into shards behind the scenes, and each shard can be moved between different servers in the cluster

Logic design

  • An index type contains multiple documents, such as document 1, document 2. When we index a document, we can find it in this order: Index >> Type >> Document ID. By this combination we can index a specific document. Note: ID does not have to be an integer; it is actually a string

9200 is different from 9300

  • Port 9300: used for communication between ES nodes
  • Port 9200: used by the ES node to communicate with external devices
  • 9300 is the TCP port number used for communication between ES clusters. 9200 Indicates the port number of the EXPOSED ES RESTful interface

IK word splitter plug-in

What is the

  • Word segmentation: Where a Duan Zhongwen or other keyword, divided into our own information in the search, they’ll put for word segmentation, the database or index to participle in the library data, and then a matching operation, the default of Chinese word segmentation is to as a word, every word such as “I love programming” will be divided into “I”, “love”, “make up”, “cheng”, This obviously does not meet the requirements, so we need to install the Chinese word segmentation IK to solve this problem
  • IK provides two word segmentation algorithms: IK_SMART and IK_MAX_word, where IK_SMART is the least segmentation and IK_max_word is the smallest granularity segmentation

IK word divider installed

download

Github download address

Windows installation

  • Unzip the downloaded files
  • Create a new folder ik under the plugins directory of es
  • Place the unzipped files in the IK folder

Linux installation

  • Basically the same as Windows

  • Copy the decompressed folder named IK to the plugins folder

Restart Observation ES

  • See the IK tokenizer plug-in loaded

  • Elasticsearch – the plugin list command

IK word divider tested in Kibana

  • kibana Dev Tools

    • Ik_smart (least sharded)

    • Ik_max_word (finer-grained partition)

  • Find problems: Words that need to be put together can be broken up. This kind of personalized word, we need to add to the dictionary of word segmentation

    IK word divider adds its own configuration

    • ik/config/IKAnalyzer.cfg.xml

    • Add custom dictionary Touchair and inject into the extension configuration, then restart ES

    • Look at the startup log and see that touchair.dic is loaded. Now test the word segmentation again

    • The test results

      • Before adding a custom dictionary: Touch is split into touch and reach

      • After configuration, you can split it into desired results

REST Style Description

  • A software architectural style, rather than a standard, provides a set of design principles and constraints. It is mainly used for client and server interaction class software. Software designed in this style can be simpler, more hierarchical, and easier to implement mechanisms such as caching

  • Basic REST commands:

    methood url describe
    PUT Localhost :9200/ index name/type name/document ID Create document (specify document ID)
    POST Localhost :9200/ Index name/type name Create document (random document ID)
    POST Localhost :9200/ index name/type name/document ID /_update Modify the document
    DELETE Localhost :9200/ index name/type name/document ID Delete specified documents
    GET Localhost :9200/ index name/type name/document ID Query documents by document ID
    POST Localhost :9200/ index name/type name /_search Query all data

Index basic operations

  • Create an index (POST)

    PUT/index name /~ Type name ~/ document ID {request body}Copy the code

    While creating the index, a piece of data is inserted

Document mapping

  • Dynamic mapping: In a relational database, you need to create a database and then create a table under that database instance before you can insert data into that table. ElasticSearch does not need to define a Mapping. When a document is written to ElasticSearch, it automatically identifies the document type based on the document field. This mechanism is called dynamic Mapping

  • Static mapping: In ElasticSearch, you can also define a map that contains the fields and types of the document. This is called static mapping

  • Type classification:

    • The value can be text or keyword

      Text is segmented by the word splitter; keyword is not segmented

    • Value types: Long, INTEGER, short, byte, double, float, half, scaled,

    • Date type: date

    • Boolean value type: Boolean

    • Binary type: binary

    • Array type: array

    • The complex type

      • Geographic location types (Geo Datatypes)
        • Geo-point datatype: Geo_point is used for latitude and longitude coordinates
        • Geo-shape datatype: Geo_shape is used for complex shapes similar to polygons
      • Specialised datatypes
        • Pv4 type (IPv4 datatype) : IP Indicates an IPv4 address
        • Completion type: Completion provides automatic Completion suggestions
        • Token Count type: Used to count the index number of sub-token fields. This value is always increased and does not decrease due to filtering conditions
        • Mapper-number3 type: The plugin allows the hash of index to be calculated using _number3
        • Attachment datatype: mapper-attachments plug-in, supporting _attachments indexes, such as Microsoft Office format, Open Document format, ePub, HTML, etc
  • Create and specify the field type (POST)

    You can also specify the toggle type

  • GET this rule

  • View the default information

    PUT /test3/_doc/1#_doc is a display of the default type. {can be omitted."name":"touchair-3"."age":"19"."birth":"2020-09-16"
    }
    Copy the code

    To view

    If you do not specify a field type for your document, ES will give you the default field type!

  • Extension: You can GET a lot of current information about ES by using the GET _cat command

  • GET _cat/health Displays health information

  • GET _cat/indices? V View all

  • Modifying data (POST/PUT)

    • PUT /test3/_doc/1
      {
        "name":"touchair-3-put"."age":"20"."birth":"2020-09-15"
      }
      
      
      POST /test3/_doc/1/_update
      {
        "doc": {"name":"touchair-3-post"}}Copy the code
    • PUT cover type

    • POST updates

    • The results view

  • DELETE determines whether to DELETE an index or document record based on the requested URL

Basic operations of the document (*)

ElasticSearch version control

  • The version field

  • Why version control CAS lock free

    In order to ensure the accuracy of data under multi-threaded operation

  • Pessimistic locks and optimistic locks

    • Pessimistic locking: Shielding all operations that might violate data accuracy, assuming that concurrency conflicts are certain
    • Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur
  • Internal and external versioning

    • Internal version: _version increases automatically. After data is modified, version is automatically increased by 1
    • External version: To keep version consistent with the value of external version control, use version_type=external to check whether the current version value of the data is less than the version value in the request

Simple operation

Adding test Data
PUT /touchair/user/1
{
  "name":"z3"."age": 11."desc":"This is z3."."tags": ["Geek"."Old straight man.".The Overtime Dog]
}

PUT /touchair/user/2
{
  "name":"l4"."age": 12."desc":"This is l4."."tags": ["Struggle force"."Men who cheat on women's affections."."Hangzhou"]
}

PUT /touchair/user/3
{
  "name":"w5"."age": 30."desc":"This is W5."."tags": ["Handsome"."Rush toward street"."Travel"]
}

PUT /touchair/user/4
{
  "name":"w55"."age": 31."desc":"This is W55"."tags": ["Pretty girl"."Go to the movies"."Travel"]
}

PUT /touchair/user/5
{
  "name":"Learning Java"."age": 32."desc":"Here's learning Java."."tags": ["Phishing"."Literacy"."Write"]
}

PUT /touchair/user/6
{
  "name":"Learning Node. Js"."age": 33."desc":"Here's learning Node.js"."tags": ["Class"."Sleep"."Play video games"]}Copy the code

GET data (GET)
GET touchair/user/1
Copy the code

Update data (POST)
POST touchair/user/2/_update
{
  "doc": {"name":"l4-2"
  }
}

GET touchair/user/2
Copy the code

Simple query
  • Conditions of the query

    GET touchair/user/_search? q=name:w5Copy the code

A complex operation

Complex query SELECT (sort, paging, highlight, fuzzy query, precise query!)

  • The attribute _score in hits represents the matching degree. The higher the matching degree, the higher the score

  • Hit:

    • Index and document information
    • The total number of query results
    • Query the specific document
    • I can go through them all
    • You can use score to figure out who is more qualified
Match the match
GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "w5"}}}Copy the code

You don’t need that many result return fields_source
GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"_source": ["name"."desc"]}Copy the code

The sorting

In reverse chronological order

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"sort": [{"age": {
        "order": "desc"}}}]Copy the code

paging

From size is equivalent to two parameters of the MySQL limit statement

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"sort": [{"age": {
        "order": "desc"}}]."from": 0."size": 1
}
Copy the code

Matching conditions
  • Precise querymust, equivalent to the MySQL and operation
GET touchair/user/_search
{
  "query": {
    "bool": {
      "must": [{"match": {
            "name": "Learning"}}, {"match": {
            "age": "32"}}]}}}Copy the code

  • Should is equivalent to the MySQL or operation

    GET touchair/user/_search
    {
      "query": {
        "bool": {
          "should": [{"match": {
                "name": "Learning"}}, {"match": {
                "age": "11"}}]}}}Copy the code

  • Must_not is equivalent to the MySQL NOT operation

    GET touchair/user/_search
    {
      "query": {
        "bool": {
          "must_not": [{"match": {
                "age": 33}}]}}}Copy the code

Matching data filteringfilter
  • The range interval
  • Gte is greater than or equal to
  • Lte less than or equal to
  • Gt is greater than the
  • Lt is less than
GET touchair/user/_search
{
  "query": {
    "bool": {
      "must": [{"match": {
            "name":"Learning"}}]."filter": {
        "range": {
          "age": {
            "lte": 32
          }
        }
      }
    }
  }
}
Copy the code

Query match with multiple conditions
  • Match is equivalent to like

    GET touchair/user/_search
    {
      "query": {
        "match": {
          "tags": "Man technique"}}}Copy the code

Precise queryterm
  • Equivalent to equals

  • Queries perform precise lookups directly by inverting the terms specified in the index

  • keywordType fields can only be looked up precisely

About participles:

Trem, direct query precisely

Match, which will be parsed using a word splitter (analyze documents first, query the documents that have been parsed)

Two types of

The text type is split by the word divider

The keyword is not split and can only be looked up precisely

Highlighting the query
  • highlight

    GET touchair/user/_search
    {
      "query": {
        "match": {
          "name": "Learning"}},"highlight": {
        "fields": {
          "name": {}}}}Copy the code

  • Custom highlighting surrounds the label

    GET touchair/user/_search
    {
      "query": {
        "match": {
          "name": "Learning"}},"highlight": {
        "pre_tags": "<p class='key',style='color:red'>"."post_tags": "</p>"."fields": {
          "name": {}}}}Copy the code

Term is different from Match
  • Term queries do not perform word segmentation on fields and use Equals instead.
  • Match will perform a word segmentation query (Like) based on the word segmentation of the field.

ES integration SpringBoot

The official documentation

The document address

  • ElasticSearch 7.6 client documentation

Maven rely on

  • pom.xml

  • <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-high-level-client</artifactId>
        <version>7.6.2</version>
    </dependency>
    Copy the code

Initialize the

Create a project

  • Create a SpringBoot project
  • Select dependencies – most notably ElasticSearch in NoSQL

API call test

Operation of index

Create indexes
Check whether the index exists
To obtain the index
Remove the index

Document the CRUD

Create a document
Get the document
Update the document
Delete the document
Batch Insert documents
Document Query (*)
  • SearchRequest SearchRequest SearchSourceBuilder conditional construction highlighter highlighting matchAllQuery matches all termQuery() exact lookupsCopy the code
  • Test class code

    package com.touchair.elk;
    
    import cn.hutool.json.JSONUtil;
    import com.touchair.elk.pojo.User;
    import org.assertj.core.util.Lists;
    import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
    import org.elasticsearch.action.bulk.BulkRequest;
    import org.elasticsearch.action.bulk.BulkResponse;
    import org.elasticsearch.action.delete.DeleteRequest;
    import org.elasticsearch.action.delete.DeleteResponse;
    import org.elasticsearch.action.get.GetRequest;
    import org.elasticsearch.action.get.GetResponse;
    import org.elasticsearch.action.index.IndexRequest;
    import org.elasticsearch.action.index.IndexResponse;
    import org.elasticsearch.action.search.SearchRequest;
    import org.elasticsearch.action.search.SearchResponse;
    import org.elasticsearch.action.support.master.AcknowledgedResponse;
    import org.elasticsearch.action.update.UpdateRequest;
    import org.elasticsearch.action.update.UpdateResponse;
    import org.elasticsearch.client.RequestOptions;
    import org.elasticsearch.client.RestHighLevelClient;
    import org.elasticsearch.client.indices.CreateIndexRequest;
    import org.elasticsearch.client.indices.CreateIndexResponse;
    import org.elasticsearch.client.indices.GetIndexRequest;
    import org.elasticsearch.client.indices.GetIndexResponse;
    import org.elasticsearch.common.unit.TimeValue;
    import org.elasticsearch.common.xcontent.XContentType;
    import org.elasticsearch.index.query.MatchAllQueryBuilder;
    import org.elasticsearch.index.query.QueryBuilders;
    import org.elasticsearch.search.SearchHit;
    import org.elasticsearch.search.builder.SearchSourceBuilder;
    import org.junit.jupiter.api.Test;
    import org.springframework.beans.factory.annotation.Qualifier;
    import org.springframework.boot.test.context.SpringBootTest;
    
    import javax.annotation.Resource;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.concurrent.TimeUnit;
    
    @SpringBootTest
    class ElkApplicationTests {
    
        public static final String INDEX_NAME = "java_touchair_index";
    
        @Resource
        @Qualifier("restHighLevelClient")
        private RestHighLevelClient restHighLevelClient;
    
    
        /** * test create index **@throws IOException
         */
        @Test
        void testCreateIndex(a) throws IOException {
            // A request to create an index
            CreateIndexRequest indexRequest = new CreateIndexRequest("java_touchair_index");
            // The client executes the IndicesClient request and obtains the response
            CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT);
            System.out.println(createIndexResponse.toString());
        }
    
        /** * test get index **@throws IOException
         */
        @Test
        void testGetIndex(a) throws IOException {
            GetIndexRequest getIndexRequest = new GetIndexRequest("java_touchair_index");
            boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
            if (exists) {
                GetIndexResponse getIndexResponse = restHighLevelClient.indices().get(getIndexRequest, RequestOptions.DEFAULT);
                System.out.println(getIndexResponse);
            } else {
                System.out.println(Index does not exist); }}/** * test delete index **@throws IOException
         */
        @Test
        void testDeleteIndex(a) throws IOException {
            DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("test2");
            AcknowledgedResponse acknowledgedResponse = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
            System.out.println(acknowledgedResponse.isAcknowledged());
        }
    
    
        /** * Test document creation **@throws IOException
         */
        @Test
        void testAddDocument(a) throws IOException {
            // Create an object
            User user = new User("java".23);
            // Create the request
            IndexRequest indexRequest = new IndexRequest("java_touchair_index");
    
            // Rule put/javA_touchair_index /_doc/1
            indexRequest.id("1");
            indexRequest.timeout(TimeValue.timeValueSeconds(1));
    
            // Put the data into the request
            indexRequest.source(JSONUtil.toJsonPrettyStr(user), XContentType.JSON);
    
            // The client sends a request to obtain
            IndexResponse indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
            System.out.println(indexResponse.toString());
            System.out.println(indexResponse.status());
        }
    
        /** * Test to obtain the document **@throws IOException
         */
        @Test
        void testGetDocument(a) throws IOException {
            // Check whether the document has get /index/_doc/1
            GetRequest getRequest = new GetRequest(INDEX_NAME, "1");
    // // does not get the context of the returned _source anymore
    // getRequest.fetchSourceContext(new FetchSourceContext(false));
    // getRequest.storedFields("_none_");
            boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);
            if (exists) {
                GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
                System.out.println(getResponse.toString());
                // Print the contents of the document
                // Everything returned is exactly the same as the command line result
                System.out.println(getResponse.getSourceAsString());
            } else {
                System.out.println("Document does not exist"); }}/** * Test update document information **@throws IOException
         */
        @Test
        void testUpdateDocument(a) throws IOException {
            UpdateRequest updateRequest = new UpdateRequest(INDEX_NAME, "1");
            updateRequest.timeout("1s");
            User user = new User("ES Search Engine".24);
            updateRequest.doc(JSONUtil.toJsonPrettyStr(user), XContentType.JSON);
            UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
            System.out.println(updateResponse.status());
        }
    
        /** * Test delete document **@throws IOException
         */
        @Test
        void testDeleteDocument(a) throws IOException {
            DeleteRequest deleteRequest = new DeleteRequest(INDEX_NAME, "1");
            deleteRequest.timeout("1s");
            DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
            System.out.println(deleteResponse.getResult());
        }
    
    
        /** * In particular, real projects are usually batch insert data **@throws IOException
         */
        @Test
        void testBulkRequest(a) throws IOException {
            BulkRequest bulkRequest = new BulkRequest();
            bulkRequest.timeout("10s");
    
            ArrayList<User> userList = Lists.newArrayList();
    
            userList.add(new User("Java".11));
            userList.add(new User("javaScript".12));
            userList.add(new User("Vue".13));
            userList.add(new User("Mysql".14));
            userList.add(new User("Docker".15));
            userList.add(new User("MongoDB".16));
            userList.add(new User("Redis".17));
            userList.add(new User("Tomcat".18));
    
            for (int i = 0; i < userList.size(); i++) {
                // Batch update and batch delete only need to modify the corresponding request here
                bulkRequest.add(new IndexRequest(INDEX_NAME)
                        .id("" + i + 1)
                        .source(JSONUtil.toJsonPrettyStr(userList.get(i)), XContentType.JSON));
    
            }
            BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
            System.out.println((bulkResponse.hasFailures())); // Success if false is returned
        }
    
    
        /** * query * SearchRequest SearchRequest * searchSourceBuilder conditional construction * highlighter highlighting * matchAllQuery matches all * termQuery() exact lookup **@throws IOException
         */
        @Test
        void testSearch(a) throws IOException {
            SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
            // Build the search criteria
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            // Query criteria, which can be quickly queried using the QueryBuilders tool
            / / QueryBuilders matchAllQuery match all
            / / QueryBuilders termQuery () precise search
    // TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 11);
            MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
            searchSourceBuilder.query(matchAllQueryBuilder);
            searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    // searchSourceBuilder.highlighter();
    // searchSourceBuilder.size();
    // searchSourceBuilder.from();
            searchRequest.source(searchSourceBuilder);
            SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    
            for(SearchHit searchHits : searchResponse.getHits().getHits()) { System.out.println(searchHits.getSourceAsMap()); }}}Copy the code

Imitation mall search

  • Crawl web page data

  • Paging search

  • The highlighted

  • rendering

Distributed Log Collection

Principles of ELK Distributed Log Collection (*)

  • Install the Logstash log collection system plug-in on each server cluster node

  • Each server node enters the log into the Logstash

  • Logstash formats this log in JSON format, creates a different index for each day, and outputs it to ElasticSearc

  • Browser installation Use Kibana to query log information

Environmental installation

  • 1. Install ElasticSearch
  • 2. Install Logstash
  • 3. Install Kibana

Logstash

introduce

  • Logstash is a completely open source tool that collects, filters, analyzes your logs, supports a wide range of data capture methods, and stores them for later use (like searching). Speaking of searching, Logstash comes with a Web interface that searches and displays all logs. The client is installed on the host that needs to collect logs. The server filters and modifies the received node logs and sends them to ElasticSearch

  • Core process:

    • Logstash event processing has three phases: input–>filters–>outputs
    • Is a tool for receiving, processing and forwarding logs
    • Supports system logs, Webserver logs, error logs, application logs, and all types of logs that can be thrown

Set up the Logstash environment

  • Download address
  • Unpack the

Logstash test

  • Enter the elsaticSearch log into the Logstash

    • Go to the config directory of the logstash directory and create touchair.conf

    • Add the following and save it

      Input {# read the log information from the file to the console file{path =>"/ usr/local/elk/elasticsearch - 7.6.1 / logs/elasticsearch log"Codec =>"json" 
              type => "elasticsearch"
              start_position =>"beginning"}} output {# standard output#stdout{}The output is formatted and the Ruby library is used to parse the logstdout { codec => rubydebug }
      }
      Copy the code
    • Start the logstash and look in the bin directory of the console logstash

      ./logstash -f .. /config/touchair.confCopy the code

Output logs to ES

  • Create and modify the touchair.es. Conf file

    Input {# read the log information from the file to the console file{path =>"/ usr/local/elk/elasticsearch - 7.6.1 / logs/elasticsearch log"Codec =>"json" 
            type => "elasticsearch"
            start_position =>"beginning"}} output {# standard output#stdout{}The output is formatted and the Ruby library is used to parse the logstdout { codec => rubydebug }
        elasticsearch {
            hosts => ["192.168.83.133:9200"]
            index => "es-%{+YYYY.MM.dd}"}}Copy the code
  • Start the logstash

    ./logstash -f .. /config/touchair.es.confCopy the code

Logstash integration Springboot

A single

  • # tcp -> Logstash -> Elasticsearch pipeline.
    input {
      tcp {
        mode => "server"
        host => "0.0.0.0"
        port => 4560
        codec => json_lines
      }
    }
    output {
      elasticsearch {
        hosts => ["192.168.83.133:9200"]
        index => "robot-java-%{+YYYY.MM.dd}"}}Copy the code

Multiple lines of run logs help locate faults

input {
    tcp {
    mode => "server"
    host => "0.0.0.0"
    port => 4560
    codec => multiline{
                 pattern => "^ \ ["
                 negate => false
                 what => "next"
                }
  }
}
filter {
     json {
           source => "message"
       }
      mutate {
         add_field => {
           "language"= >"%{[type]}"
        }
      }
}
output{
    if [language]=="java" {
         elasticsearch {
                 hosts => ["172.17.0.8:9200"]
                 index => "robot-java-%{+YYYY.MM.dd}"}}if [language]=="ros" {
         elasticsearch {
                 hosts => ["172.17.0.8:9200"]
                 index => "robot-ros-%{+YYYY.MM.dd}"}}if [language]=="rec" {
         elasticsearch {
                 hosts => ["172.17.0.8:9200"]
                 index => "robot-rec-%{+YYYY.MM.dd}"}}}Copy the code

ELK docker deployment

Install the ElasticSearch

Pull the mirror
Docker pull elasticsearch: 7.6.1Copy the code
Run the container
  • Run the command to create the startup container:
Docker run -d --name es -p 9200:9200 -p 9300:9300 \ -e "discovery. Type =single-node" ElasticSearch :7.6.1Copy the code
  • Copy configuration files and data directories for mounting
docker cp es:/usr/share/elasticsearch/config/ /var/elk/elasticsearch/config
docker cp es:/usr/share/elasticsearch/data/ /var/elk/elasticsearch/data
Copy the code
  • Set to allow cross-domain access

    • vim  /var/elk/elasticsearch/config/elasticsearch.yml
      
      #Add these two rows
      http.cors.enabled: true
      http.cors.allow-origin: "*"
      Copy the code
  • Destroy the container and run again in mount mode

#The destruction
docker rm -f es

#Mount the configuration filedocker run -d --name es -p 9200:9200 -p 9300:9300 \ -v /var/elk/elasticsearch/config/:/usr/share/elasticsearch/config/ \ - v/var/elk/elasticsearch/data /, / usr/share/elasticsearch/data / \ - e discovery. Type = "single - node" \ elasticsearch: 7.6.1Copy the code
  • Access port 9200 of the host IP address and check whether the host is successfully started

Install Kibana

Pull the mirror
Docker pull kibana: 7.6.1Copy the code
Run the container
  • Run the container first

    Docker run -d --name kibana -P 5601:5601 kibana:7.6.1Copy the code
  • Copy the configuration file and mount it later

    #copy
    docker cp kibana:/usr/share/kibana/config/ /var/elk/kibana/config
    
    #View the internal IP address of the ES container
    docker exec -it es ifconfig
    
    #Modify the configuration
    vim kibana.yml 
    
    Copy the code

  • Mount the run

    #Destroy the container first
    docker rm -f kibana
    
    #Run the containerDocker run - d - name kibana -p 5601: \ 5601 - v/var/elk/kibana/config: / usr/share/kibana/config \ kibana: 7.6.1Copy the code
  • Host IP :5601, view the Kibana graphical interface

Install the LogStash

Pull the mirror
Docker pull logstash: 7.6.1Copy the code
Run the container
  • Run the container first

    Docker run --name logstash -d -p 4560:4560 -p 9600:9600 logstash:7.6.1Copy the code
  • Copy the configuration file and mount it later

    docker cp logstash:/usr/share/logstash/config /var/elk/logstash/config
    Copy the code
    • Add a customized CONF file

      input {
          tcp {
          mode => "server"
          host => "0.0.0.0"
          port => 4560
          codec => multiline{
                       pattern => "^ \ ["
                       negate => false
                       what => "next"
                      }
        }
      }
      filter {
           json {
                 source => "message"
             }
            mutate {
               add_field => {
                 "language"= >"%{[type]}"
              }
            }
      }
      output{
          if [language]=="java" {
               elasticsearch {
                       hosts => ["172.17.0.8:9200"]
                       index => "robot-java-%{+YYYY.MM.dd}"}}if [language]=="ros" {
               elasticsearch {
                       hosts => ["172.17.0.8:9200"]
                       index => "robot-ros-%{+YYYY.MM.dd}"}}if [language]=="rec" {
               elasticsearch {
                       hosts => ["172.17.0.8:9200"]
                       index => "robot-rec-%{+YYYY.MM.dd}"}}}Copy the code
    • Modify the configuration file logstash. Yml

      vim logstash.yml
      Copy the code

  • Mount the run

    #Remove the container
    docker rm -f logstash
    
    #Restart the containerdocker run --name logstash -d -p 4560:4560 -p 9600:9600 \ -v /var/elk/logstash/config:/usr/share/logstash/config \ Logstash: 7.6.1 \ -f/usr/share/logstash/config/robot. ConfCopy the code

Yml and logstash. Yml and the ES service address in the customized conf file, and restart Kibana and Logstash