Distributed log collection ELK

Github project address
ELK is the initials of ElasticSearch, Logstash, and Kibana. It’s also called the Elastic Stack. ElasticSearch is a Lucene based, distributed, RESTful interactive near real-time search platform framework. ElasticSearch can be used as the basic support framework for big data search engines like Google and Baidu. ElasticSearch provides powerful search capabilities. Logstash is the central data flow engine, ELK from different target (documents/data storage/MQ) to collect data of different formats, support after filtered output to the different purpose of (file/MQ/redis/elasticsearch, kafka, etc.). Kibana allows ElasticSearch data to be displayed on a friendly page that provides real-time analysis
Many developers on the market can always refer to ELK as a log analysis architecture stack, but in fact ELK is not only suitable for log analysis, it can support any other data analysis and collection scenario, log analysis and collection is more representative, not unique
Collect cleaning data — search, store –Kibana

ElasticSearch

Introduction of Lucene

An overview of the

Lucene is a jar package for information retrieval. No search engines included! Index structure, tools for reading and writing indexes, sorting, search rules…… (Solr)
Java written, the goal is to add full-text retrieval functions for a variety of small and medium-sized applications

Relations with ElasticSearch

ElasticSearch is based on Lucene with some encapsulation and enhancements

ElasticSearch profile

An overview of the

ElasticSearch (ES for short) is an open source and highly extensible distributed full-text search engine that can store and retrieve data in near real time. It can scale to hundreds of servers and handle petabytes of data. Es is also developed in Java and uses Lucene as its core to implement all indexing and search functions, but it aims to make full-text search simple by hiding the complexity of Lucene with a simple RESTful API
ElasticSearch surpassed ElasticSearch in January 2016, according to DB Engines

Who is using

Wikipedia, full text search, highlighting, search recommendations (weight)
News website, similar to Sohu news, user behavior log (click, browse, favorites, comments) + social network data (relevant views on *** news), data analysis, to give the author of each news article, let them know the public feedback of his article (good article, hydrological, popular)
Stack Overflow (foreign Programmer Exception Discussion Forum)
Github, search hundreds of billions of lines of code
E-commerce site, search for goods
Log data analysis, LogStash collection, ES replication data analysis, ELK (ElasticSearch+ Logstash +Kibana)
A commodity price monitoring site where users set a price threshold for a commodity and send a notification message when the price is lower than the threshold
BI systems, Business Intelligence, Business Intelligence. For example, in a large shopping mall, BI analyzes the trend of user consumption amount in a certain area in recent three years and the composition structure of user groups, produces relevant data reports, ES performs data analysis and mining, and KiBANA performs data visualization
Domestic: site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, etc.), data analysis (a popular use scenario of ES)

Solr and ES

ElasticSearch profile

ElasticSearch is a real-time distributed search and analysis engine. It makes it possible for you to process big data faster than ever before
It is used forFull-text search,Structured search,Analysis of theAnd a mix of all three:
- Wikipedia uses ElasticSearch to provide full-text searches and highlight keywords, as well as Search suggestions such as search-asyou-type and did-you-mean
- The Guardian uses ElasticSearch in combination with user logs and social network data to provide their editors with real-time feedback to see how the public is responding to new posts
- Stack Overflow combines full-text search with geolocation queries and more-like-this functionality to find relevant questions and answers
- Github uses ElasticSearch to retrieve 130 billion lines of code
ElasticSearch is an open source search engine based on Apache Lucene (TM). Lucene is arguably the most advanced, high-performance, and full-featured search engine library to date, both open source and proprietary
- But Lucene is just a library, and to use it, you have to use Java code as the development language and integrate it directly into your application. What’s worse, Lucene is so complex that you need to know a lot about retrieval to understand how it works
- ElasticSearch is also developed in Java and uses Lucene as its core for all indexing and search functions, but it aims to hide the complexity of Lucene with a simple RESTful API to make full text search easy

Solr profile

Solr is a top open source project under Apache, developed in Java. It is a full-text search server based on Lucene. Solr provides a richer query language than Lucene, and at the same time realizes configurable, extensible, and optimized index and search performance
Solr can run independently in Servlet containers such as Jetty and Tomcat. Solr index is realized simply by sending an XML document describing the Field and its contents to Solr server using POST method. Solr adds, deletes and updates indexes according to the XML document. Solr searches only need to send HTTP GET requests, and then organize the page layout by parsing the query results returned by Solr in XML or JSON formats. Solr does not provide the UI building function. Solr provides a management page on which you can query the configuration and running status of Solr
Solr develops enterprise-class search servers based on Lucene, essentially encapsulating Lucene
Solr is an independent enterprise-level search application server. It provides an API interface similar to Web Service. Users can submit files in a certain format to search engine servers through HTTP requests and generate indexes. You can also make a lookup request and get the result back

ElasricSearch vs. Solr

Solr is faster when simply searching through existing data
When indexes are created in real time, Solr causes I/O congestion, resulting in poor query performance. ElasticSearch has an obvious advantage
Solr becomes less efficient as the amount of data increases, while ElasticSearch doesn’t change significantly
By converting our search infrastructure from Solr ElasticSearch, you can find ~50 times better search performance

Summary of ElasticSearch vs Solr

Es is basically out of the box, very simple. Solr installation is a little more complicated
Solr uses Zookeeper for distributed management, while ElasticSearch provides distributed coordination management
Solr supports more data formats such as JSON, XML, and CSV, whereas ElasticSearch only supports JSON files
Solr offers a lot of features, but ElasticSearch itself focuses on core features. Advanced features are provided by third-party add-ons, such as the Kibana graphical interface
Solr is faster to query, but slower to update indexes (that is, slow to insert and delete), which is used in e-commerce applications with many queries
- ES is fast in index building (slow in query) and real-time query. It is used for Facebook, Sina and other searches
- Solr is a great solution for traditional search applications, but ElasticSearch is better suited for emerging real-time search applications
Solr is mature and has a much larger and more mature community of users, developers and contributors, whereas ElasticSearch is less developed and maintained, updates are too fast and costs more to learn and use

Inverted index (*)

Traditional search forward index full text search: inverted index
Each entry in such an index table contains an attribute value and the address of the records that have that attribute value. May I have an Inverted index, because it is not the records that determine the attribute values, but the attribute values that determine the position of the records?
Inverted indexes have two different forms of inverted indexes:
- A horizontal reverse index (or reverse archive index) of a record contains a list of documents for each reference word
- A horizontal reverse index (or full reverse index) of a word contains the position of each word in a document
As shown in the following example:

The inverted index classifies the contents of the above documents with keywords, which can be used to directly locate the contents of the documents

ElasticSearch installation

Declaration: JDK1.8, minimum requirements! ElasticSearch client interface tool
Java development, ElasticSearch version and we after the corresponding Java core JAR package! Version corresponding JDK environment is normal

download

Official website download address

Install the ES

Windows environment

Unpack the
Directory file
- Bin Startup file
- Config Configuration file
  - Log4j2 Log configuration file
  - JVM Options Java VM configuration (1 GB memory by default)
  - Elasticsearch. Yml ElasticSearch configuration file (default port 9200 etc)
- Lib Related JAR package
- Logs log
- Modules Function module
- Plugins (*)

Linux environment

Gz Decompress the installation package
```
tar -zxvf ***.tar.gz
Copy the code
```

By default, ES does not support IP access. Modify elasticSearch. yml in config

network.host: 192.16883.133.
cluster.initial_master_nodes: ["node-1"."node-2"]
Copy the code

The installation package boot mode requires additional configuration parameters

Modify limits on file handles

## modify restriction
sudo vi /etc/sysctl.conf
## Check whether it takes effect
sudo sysctl -p
Copy the code

If the maximum number of open files for each process is too small, change the size of the open files

sudo vi /etc/security/limits.conf
Copy the code

Add content

 *               soft    nproc           4096
 *               hard    nproc           4096
 *               soft    nofile          65536
 *               hard    nofile          65536
Copy the code

## Check the soft limit size with the command
ulimit -Sn 
#Check the hard limit size with the command
ulimit -Hn
Copy the code

Restart the PC and restart ElasticSearch

Start the ES

Double-click ElasticSearch. bat to start ElasticSearch
The default exposed port is 9200
Access the browser 127.0.0.1:9200

Install the visual interface Head

Download address

The Node environment is required
Head download address

Compile operation

A cross-domain problem occurred while accessing 9100, causing a failure to connect to 9200
Add the following configuration for elasticSearch. yml
```
http.cors.enabled: true
http.cors.allow-origin: "*"
Copy the code
```
- Restart the ElasticSearch
For starters, you can think of ES as a database. You can create indexes (tables), documents (data in tables).

Head sees it as a data presentation tool. All subsequent queries are done in Kibana

Kibana

Kibana profile

Kibana is an open source analysis and visualization platform for ElasticSearch that allows you to search and view interactive data stored in the ElasticSearch index. With Kibana, you can perform advanced data analysis and presentation through various charts. Kibana makes massive amounts of data easier to understand with a simple, browser-based user interface that allows you to quickly create dashboards that display Elasticsearch queries in real time. Setting up Kibana is very simple. You can install Kibana and start ElasticSearch index monitoring in minutes without coding or additional infrastructure

Kibana installation

download

Kibana download address

The installation

Windows environment

Unpack the
Is a standard engineering bin/kibana.bat

Linux environment

Unzip kibana 7.6.1 – Linux – x86_64. Tar. Gz
Modify vim Kibana.yml

CD/usr/local/elk/kibana - 7.6.1 - Linux - x86_64 / bin /#Start the
./kibana --allow-root
Copy the code

Start the Kibana

Bin/kibana. Bat double-click
Access test http://localhost:5601
The development tools
- PostMan
- curl
- head
- Google Chrome plug-in test (support Chinese)
  
  All subsequent operations are performed here

ES Core Concepts

An overview of the

The above content already knows what ES is, and the service of ES has been installed and started, so how does ES store data, what is the data structure, and how to realize search?

The concept of ES

The cluster
node
shard
- How do nodes and sharding work
  - A cluster has at least one node, and a node is an ES process. A node can have multiple indexes. By default, if you create an index, the index will have 5 primary shards, and each primary shard will have a replica.
  - The figure above shows a cluster with three nodes. You can see that the master shard and the corresponding replication shard are not in the same node, so that even if a node fails, data will not be lost. In effect, a shard is a Lucene index, a directory of files with inverted indexes that are structured so that ES can tell you which documents contain a particular keyword without scanning the entire document
  - Inverted index
    
    Es uses a structure called an inverted index, with the Lucene inverted index as the underlying layer. This structure is suitable for fast full-text searches, where an index consists of all non-repeating lists in a document and, for each word, a list of documents containing it. For example, there are now two documents, each containing the following:
```
Study every Day, Good Good Up to forever # Study every Day, Good Good Up #Copy the code
```
    To create an inverted index, divide each document into individual words (or terms or tokens) and then create a sorted list of all non-repeating terms. Then list which document each word appears in:
    
    Now we try to search for to Forever by looking at the document that contains each term
  - For another example, if we search for blog posts by blog tags, the inverted index list would look like this:
    - · If you want to search for articles with Python tags, it will be much faster to find the data in the inverted index than to find all the raw data. Just look at the tag column and get the relevant article ID. Completely filter out all irrelevant data to improve efficiency
  - Elasticsearch index vs. Lucene index
    - In ElasticSearch, the word index (library) is used a lot, this is how the term is used. In ElasticSearch, indexes are split into shards, each of which is a Lucene index. So an ElasticSearch index is made up of multiple Lucene indexes
The index
- That’s the database
- An index is a container of mapping types, and an index in ES is a very large collection of documents. The index stores fields and other Settings for the mapping type. They are then stored on the individual shards
type
- A type is a logical container for a document. Like a relational database, a table is a container for rows, and the definitions of fields within a type are called mappings, such as name mapping to a string type
The document
- To say that ES is document-oriented means that the smallest unit of index and search data is a document. In ES, documents have several important properties:
  - Self-contained, a document that contains both fields and corresponding values, i.e., key: value!
  - It can be hierarchical, with a document containing its own document, which is where complex logical entities come from
  - Flexible structure, documents do not rely on pre-defined schema, in a relational database requires pre-defined fields to use, in ES, for the field is very flexible, sometimes you can ignore the field, or dynamically add a new field
- Although we can add or omit fields at will, each field type is important. For example, an age field type can be either a string or an integer. Because ES keeps the mapping between fields and types and other Settings, the mapping is specific to each type of each map, which is why in ES, types are sometimes called mapping types, right
mapping

MySQL vs. ElasticSearch

Elasticsearch is a document-oriented, relational database compared to ElasticSearch

MySQL	ElasticSearch
Database	Indices
Tables (tables)	types
Lines (rows)	document
Columns	field

Elasticsearch can have multiple indexes (databases), each index can have multiple types (tables), each type can have multiple documents (rows), and each document can have multiple fields (columns).

Physical design

Elasticsearch splits each index into shards behind the scenes, and each shard can be moved between different servers in the cluster

Logic design

An index type contains multiple documents, such as document 1, document 2. When we index a document, we can find it in this order: Index >> Type >> Document ID. By this combination we can index a specific document. Note: ID does not have to be an integer; it is actually a string

9200 is different from 9300

Port 9300: used for communication between ES nodes
Port 9200: used by the ES node to communicate with external devices
9300 is the TCP port number used for communication between ES clusters. 9200 Indicates the port number of the EXPOSED ES RESTful interface

IK word splitter plug-in

What is the

Word segmentation: Where a Duan Zhongwen or other keyword, divided into our own information in the search, they’ll put for word segmentation, the database or index to participle in the library data, and then a matching operation, the default of Chinese word segmentation is to as a word, every word such as “I love programming” will be divided into “I”, “love”, “make up”, “cheng”, This obviously does not meet the requirements, so we need to install the Chinese word segmentation IK to solve this problem
IK provides two word segmentation algorithms: IK_SMART and IK_MAX_word, where IK_SMART is the least segmentation and IK_max_word is the smallest granularity segmentation

IK word divider installed

download

Github download address

Windows installation

Unzip the downloaded files
Create a new folder ik under the plugins directory of es
Place the unzipped files in the IK folder

Linux installation

Basically the same as Windows
Copy the decompressed folder named IK to the plugins folder

Restart Observation ES

See the IK tokenizer plug-in loaded
Elasticsearch – the plugin list command

IK word divider tested in Kibana

kibana Dev Tools
- Ik_smart (least sharded)
- Ik_max_word (finer-grained partition)
Find problems: Words that need to be put together can be broken up. This kind of personalized word, we need to add to the dictionary of word segmentation

IK word divider adds its own configuration
- ik/config/IKAnalyzer.cfg.xml
- Add custom dictionary Touchair and inject into the extension configuration, then restart ES
- Look at the startup log and see that touchair.dic is loaded. Now test the word segmentation again
- The test results
  - Before adding a custom dictionary: Touch is split into touch and reach
  - After configuration, you can split it into desired results

REST Style Description

A software architectural style, rather than a standard, provides a set of design principles and constraints. It is mainly used for client and server interaction class software. Software designed in this style can be simpler, more hierarchical, and easier to implement mechanisms such as caching

Basic REST commands:

methood	url	describe
PUT	Localhost :9200/ index name/type name/document ID	Create document (specify document ID)
POST	Localhost :9200/ Index name/type name	Create document (random document ID)
POST	Localhost :9200/ index name/type name/document ID /_update	Modify the document
DELETE	Localhost :9200/ index name/type name/document ID	Delete specified documents
GET	Localhost :9200/ index name/type name/document ID	Query documents by document ID
POST	Localhost :9200/ index name/type name /_search	Query all data

Index basic operations

Create an index (POST)
```
PUT/index name /~ Type name ~/ document ID {request body}Copy the code
```
While creating the index, a piece of data is inserted

Document mapping

Dynamic mapping: In a relational database, you need to create a database and then create a table under that database instance before you can insert data into that table. ElasticSearch does not need to define a Mapping. When a document is written to ElasticSearch, it automatically identifies the document type based on the document field. This mechanism is called dynamic Mapping
Static mapping: In ElasticSearch, you can also define a map that contains the fields and types of the document. This is called static mapping
Type classification:
- The value can be text or keyword
  
  Text is segmented by the word splitter; keyword is not segmented
- Value types: Long, INTEGER, short, byte, double, float, half, scaled,
- Date type: date
- Boolean value type: Boolean
- Binary type: binary
- Array type: array
- The complex type
  - Geographic location types (Geo Datatypes)
    - Geo-point datatype: Geo_point is used for latitude and longitude coordinates
    - Geo-shape datatype: Geo_shape is used for complex shapes similar to polygons
  - Specialised datatypes
    - Pv4 type (IPv4 datatype) : IP Indicates an IPv4 address
    - Completion type: Completion provides automatic Completion suggestions
    - Token Count type: Used to count the index number of sub-token fields. This value is always increased and does not decrease due to filtering conditions
    - Mapper-number3 type: The plugin allows the hash of index to be calculated using _number3
    - Attachment datatype: mapper-attachments plug-in, supporting _attachments indexes, such as Microsoft Office format, Open Document format, ePub, HTML, etc
Create and specify the field type (POST)

You can also specify the toggle type
GET this rule

View the default information

PUT /test3/_doc/1#_doc is a display of the default type. {can be omitted."name":"touchair-3"."age":"19"."birth":"2020-09-16"
}
Copy the code

To view

If you do not specify a field type for your document, ES will give you the default field type!

Extension: You can GET a lot of current information about ES by using the GET _cat command

GET _cat/health Displays health information

GET _cat/indices? V View all

Modifying data (POST/PUT)

PUT /test3/_doc/1
{
  "name":"touchair-3-put"."age":"20"."birth":"2020-09-15"
}


POST /test3/_doc/1/_update
{
  "doc": {"name":"touchair-3-post"}}Copy the code

PUT cover type
POST updates
The results view

DELETE determines whether to DELETE an index or document record based on the requested URL

Basic operations of the document (*)

ElasticSearch version control

The version field
Why version control CAS lock free

In order to ensure the accuracy of data under multi-threaded operation
Pessimistic locks and optimistic locks
- Pessimistic locking: Shielding all operations that might violate data accuracy, assuming that concurrency conflicts are certain
- Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur
Internal and external versioning
- Internal version: _version increases automatically. After data is modified, version is automatically increased by 1
- External version: To keep version consistent with the value of external version control, use version_type=external to check whether the current version value of the data is less than the version value in the request

Simple operation

Adding test Data

PUT /touchair/user/1
{
  "name":"z3"."age": 11."desc":"This is z3."."tags": ["Geek"."Old straight man.".The Overtime Dog]
}

PUT /touchair/user/2
{
  "name":"l4"."age": 12."desc":"This is l4."."tags": ["Struggle force"."Men who cheat on women's affections."."Hangzhou"]
}

PUT /touchair/user/3
{
  "name":"w5"."age": 30."desc":"This is W5."."tags": ["Handsome"."Rush toward street"."Travel"]
}

PUT /touchair/user/4
{
  "name":"w55"."age": 31."desc":"This is W55"."tags": ["Pretty girl"."Go to the movies"."Travel"]
}

PUT /touchair/user/5
{
  "name":"Learning Java"."age": 32."desc":"Here's learning Java."."tags": ["Phishing"."Literacy"."Write"]
}

PUT /touchair/user/6
{
  "name":"Learning Node. Js"."age": 33."desc":"Here's learning Node.js"."tags": ["Class"."Sleep"."Play video games"]}Copy the code

GET data (GET)

GET touchair/user/1
Copy the code

Update data (POST)

POST touchair/user/2/_update
{
  "doc": {"name":"l4-2"
  }
}

GET touchair/user/2
Copy the code

Simple query

Conditions of the query

GET touchair/user/_search? q=name:w5Copy the code

A complex operation

Complex query SELECT (sort, paging, highlight, fuzzy query, precise query!)

The attribute _score in hits represents the matching degree. The higher the matching degree, the higher the score
Hit:
- Index and document information
- The total number of query results
- Query the specific document
- I can go through them all
- You can use score to figure out who is more qualified

Match the match

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "w5"}}}Copy the code

You don’t need that many result return fields`_source`

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"_source": ["name"."desc"]}Copy the code

The sorting

In reverse chronological order

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"sort": [{"age": {
        "order": "desc"}}}]Copy the code

paging

From size is equivalent to two parameters of the MySQL limit statement

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"sort": [{"age": {
        "order": "desc"}}]."from": 0."size": 1
}
Copy the code

Matching conditions

Precise querymust, equivalent to the MySQL and operation

GET touchair/user/_search
{
  "query": {
    "bool": {
      "must": [{"match": {
            "name": "Learning"}}, {"match": {
            "age": "32"}}]}}}Copy the code

Should is equivalent to the MySQL or operation

GET touchair/user/_search
{
  "query": {
    "bool": {
      "should": [{"match": {
            "name": "Learning"}}, {"match": {
            "age": "11"}}]}}}Copy the code

Must_not is equivalent to the MySQL NOT operation

GET touchair/user/_search
{
  "query": {
    "bool": {
      "must_not": [{"match": {
            "age": 33}}]}}}Copy the code

Matching data filtering`filter`

The range interval
Gte is greater than or equal to
Lte less than or equal to
Gt is greater than the
Lt is less than

GET touchair/user/_search
{
  "query": {
    "bool": {
      "must": [{"match": {
            "name":"Learning"}}]."filter": {
        "range": {
          "age": {
            "lte": 32
          }
        }
      }
    }
  }
}
Copy the code

Query match with multiple conditions

Match is equivalent to like

GET touchair/user/_search
{
  "query": {
    "match": {
      "tags": "Man technique"}}}Copy the code

Precise query`term`

Equivalent to equals
Queries perform precise lookups directly by inverting the terms specified in the index
keywordType fields can only be looked up precisely

About participles:

Trem, direct query precisely

Match, which will be parsed using a word splitter (analyze documents first, query the documents that have been parsed)

Two types of

The text type is split by the word divider

The keyword is not split and can only be looked up precisely

Highlighting the query

highlight

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"highlight": {
    "fields": {
      "name": {}}}}Copy the code

Custom highlighting surrounds the label

GET touchair/user/_search
{
  "query": {
    "match": {
      "name": "Learning"}},"highlight": {
    "pre_tags": "<p class='key',style='color:red'>"."post_tags": "</p>"."fields": {
      "name": {}}}}Copy the code

Term is different from Match

Term queries do not perform word segmentation on fields and use Equals instead.
Match will perform a word segmentation query (Like) based on the word segmentation of the field.

ES integration SpringBoot

The official documentation

The document address

ElasticSearch 7.6 client documentation

Maven rely on

pom.xml

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.6.2</version>
</dependency>
Copy the code

Initialize the

Create a project

Create a SpringBoot project
Select dependencies – most notably ElasticSearch in NoSQL

API call test

Operation of index

Create indexes

Check whether the index exists

To obtain the index

Remove the index

Document the CRUD

Create a document

Get the document

Update the document

Delete the document

Batch Insert documents

Document Query (*)

SearchRequest SearchRequest SearchSourceBuilder conditional construction highlighter highlighting matchAllQuery matches all termQuery() exact lookupsCopy the code

Test class code

package com.touchair.elk;

import cn.hutool.json.JSONUtil;
import com.touchair.elk.pojo.User;
import org.assertj.core.util.Lists;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

@SpringBootTest
class ElkApplicationTests {

    public static final String INDEX_NAME = "java_touchair_index";

    @Resource
    @Qualifier("restHighLevelClient")
    private RestHighLevelClient restHighLevelClient;


    /** * test create index **@throws IOException
     */
    @Test
    void testCreateIndex(a) throws IOException {
        // A request to create an index
        CreateIndexRequest indexRequest = new CreateIndexRequest("java_touchair_index");
        // The client executes the IndicesClient request and obtains the response
        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT);
        System.out.println(createIndexResponse.toString());
    }

    /** * test get index **@throws IOException
     */
    @Test
    void testGetIndex(a) throws IOException {
        GetIndexRequest getIndexRequest = new GetIndexRequest("java_touchair_index");
        boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT);
        if (exists) {
            GetIndexResponse getIndexResponse = restHighLevelClient.indices().get(getIndexRequest, RequestOptions.DEFAULT);
            System.out.println(getIndexResponse);
        } else {
            System.out.println(Index does not exist); }}/** * test delete index **@throws IOException
     */
    @Test
    void testDeleteIndex(a) throws IOException {
        DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("test2");
        AcknowledgedResponse acknowledgedResponse = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
        System.out.println(acknowledgedResponse.isAcknowledged());
    }


    /** * Test document creation **@throws IOException
     */
    @Test
    void testAddDocument(a) throws IOException {
        // Create an object
        User user = new User("java".23);
        // Create the request
        IndexRequest indexRequest = new IndexRequest("java_touchair_index");

        // Rule put/javA_touchair_index /_doc/1
        indexRequest.id("1");
        indexRequest.timeout(TimeValue.timeValueSeconds(1));

        // Put the data into the request
        indexRequest.source(JSONUtil.toJsonPrettyStr(user), XContentType.JSON);

        // The client sends a request to obtain
        IndexResponse indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(indexResponse.toString());
        System.out.println(indexResponse.status());
    }

    /** * Test to obtain the document **@throws IOException
     */
    @Test
    void testGetDocument(a) throws IOException {
        // Check whether the document has get /index/_doc/1
        GetRequest getRequest = new GetRequest(INDEX_NAME, "1");
// // does not get the context of the returned _source anymore
// getRequest.fetchSourceContext(new FetchSourceContext(false));
// getRequest.storedFields("_none_");
        boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);
        if (exists) {
            GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
            System.out.println(getResponse.toString());
            // Print the contents of the document
            // Everything returned is exactly the same as the command line result
            System.out.println(getResponse.getSourceAsString());
        } else {
            System.out.println("Document does not exist"); }}/** * Test update document information **@throws IOException
     */
    @Test
    void testUpdateDocument(a) throws IOException {
        UpdateRequest updateRequest = new UpdateRequest(INDEX_NAME, "1");
        updateRequest.timeout("1s");
        User user = new User("ES Search Engine".24);
        updateRequest.doc(JSONUtil.toJsonPrettyStr(user), XContentType.JSON);
        UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
        System.out.println(updateResponse.status());
    }

    /** * Test delete document **@throws IOException
     */
    @Test
    void testDeleteDocument(a) throws IOException {
        DeleteRequest deleteRequest = new DeleteRequest(INDEX_NAME, "1");
        deleteRequest.timeout("1s");
        DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
        System.out.println(deleteResponse.getResult());
    }


    /** * In particular, real projects are usually batch insert data **@throws IOException
     */
    @Test
    void testBulkRequest(a) throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("10s");

        ArrayList<User> userList = Lists.newArrayList();

        userList.add(new User("Java".11));
        userList.add(new User("javaScript".12));
        userList.add(new User("Vue".13));
        userList.add(new User("Mysql".14));
        userList.add(new User("Docker".15));
        userList.add(new User("MongoDB".16));
        userList.add(new User("Redis".17));
        userList.add(new User("Tomcat".18));

        for (int i = 0; i < userList.size(); i++) {
            // Batch update and batch delete only need to modify the corresponding request here
            bulkRequest.add(new IndexRequest(INDEX_NAME)
                    .id("" + i + 1)
                    .source(JSONUtil.toJsonPrettyStr(userList.get(i)), XContentType.JSON));

        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        System.out.println((bulkResponse.hasFailures())); // Success if false is returned
    }


    /** * query * SearchRequest SearchRequest * searchSourceBuilder conditional construction * highlighter highlighting * matchAllQuery matches all * termQuery() exact lookup **@throws IOException
     */
    @Test
    void testSearch(a) throws IOException {
        SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
        // Build the search criteria
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // Query criteria, which can be quickly queried using the QueryBuilders tool
        / / QueryBuilders matchAllQuery match all
        / / QueryBuilders termQuery () precise search
// TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 11);
        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(matchAllQueryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
// searchSourceBuilder.highlighter();
// searchSourceBuilder.size();
// searchSourceBuilder.from();
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        for(SearchHit searchHits : searchResponse.getHits().getHits()) { System.out.println(searchHits.getSourceAsMap()); }}}Copy the code

Imitation mall search

Crawl web page data
Paging search
The highlighted
rendering