-
Github project address
-
ELK is the initials of ElasticSearch, Logstash, and Kibana. It’s also called the Elastic Stack. ElasticSearch is a Lucene based, distributed, RESTful interactive near real-time search platform framework. ElasticSearch can be used as the basic support framework for big data search engines like Google and Baidu. ElasticSearch provides powerful search capabilities. Logstash is the central data flow engine, ELK from different target (documents/data storage/MQ) to collect data of different formats, support after filtered output to the different purpose of (file/MQ/redis/elasticsearch, kafka, etc.). Kibana allows ElasticSearch data to be displayed on a friendly page that provides real-time analysis
-
Many developers on the market can always refer to ELK as a log analysis architecture stack, but in fact ELK is not only suitable for log analysis, it can support any other data analysis and collection scenario, log analysis and collection is more representative, not unique
-
Collect cleaning data — search, store –Kibana
ElasticSearch
Introduction of Lucene
An overview of the
- Lucene is a jar package for information retrieval. No search engines included! Index structure, tools for reading and writing indexes, sorting, search rules…… (Solr)
- Java written, the goal is to add full-text retrieval functions for a variety of small and medium-sized applications
Relations with ElasticSearch
- ElasticSearch is based on Lucene with some encapsulation and enhancements
ElasticSearch profile
An overview of the
- ElasticSearch (ES for short) is an open source and highly extensible distributed full-text search engine that can store and retrieve data in near real time. It can scale to hundreds of servers and handle petabytes of data. Es is also developed in Java and uses Lucene as its core to implement all indexing and search functions, but it aims to make full-text search simple by hiding the complexity of Lucene with a simple RESTful API
- ElasticSearch surpassed ElasticSearch in January 2016, according to DB Engines
Who is using
- Wikipedia, full text search, highlighting, search recommendations (weight)
- News website, similar to Sohu news, user behavior log (click, browse, favorites, comments) + social network data (relevant views on *** news), data analysis, to give the author of each news article, let them know the public feedback of his article (good article, hydrological, popular)
- Stack Overflow (foreign Programmer Exception Discussion Forum)
- Github, search hundreds of billions of lines of code
- E-commerce site, search for goods
- Log data analysis, LogStash collection, ES replication data analysis, ELK (ElasticSearch+ Logstash +Kibana)
- A commodity price monitoring site where users set a price threshold for a commodity and send a notification message when the price is lower than the threshold
- BI systems, Business Intelligence, Business Intelligence. For example, in a large shopping mall, BI analyzes the trend of user consumption amount in a certain area in recent three years and the composition structure of user groups, produces relevant data reports, ES performs data analysis and mining, and KiBANA performs data visualization
- Domestic: site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, etc.), data analysis (a popular use scenario of ES)
Solr and ES
ElasticSearch profile
- ElasticSearch is a real-time distributed search and analysis engine. It makes it possible for you to process big data faster than ever before
- It is used forFull-text search,Structured search,Analysis of theAnd a mix of all three:
- Wikipedia uses ElasticSearch to provide full-text searches and highlight keywords, as well as Search suggestions such as search-asyou-type and did-you-mean
- The Guardian uses ElasticSearch in combination with user logs and social network data to provide their editors with real-time feedback to see how the public is responding to new posts
- Stack Overflow combines full-text search with geolocation queries and more-like-this functionality to find relevant questions and answers
- Github uses ElasticSearch to retrieve 130 billion lines of code
- ElasticSearch is an open source search engine based on Apache Lucene (TM). Lucene is arguably the most advanced, high-performance, and full-featured search engine library to date, both open source and proprietary
- But Lucene is just a library, and to use it, you have to use Java code as the development language and integrate it directly into your application. What’s worse, Lucene is so complex that you need to know a lot about retrieval to understand how it works
- ElasticSearch is also developed in Java and uses Lucene as its core for all indexing and search functions, but it aims to hide the complexity of Lucene with a simple RESTful API to make full text search easy
Solr profile
- Solr is a top open source project under Apache, developed in Java. It is a full-text search server based on Lucene. Solr provides a richer query language than Lucene, and at the same time realizes configurable, extensible, and optimized index and search performance
- Solr can run independently in Servlet containers such as Jetty and Tomcat. Solr index is realized simply by sending an XML document describing the Field and its contents to Solr server using POST method. Solr adds, deletes and updates indexes according to the XML document. Solr searches only need to send HTTP GET requests, and then organize the page layout by parsing the query results returned by Solr in XML or JSON formats. Solr does not provide the UI building function. Solr provides a management page on which you can query the configuration and running status of Solr
- Solr develops enterprise-class search servers based on Lucene, essentially encapsulating Lucene
- Solr is an independent enterprise-level search application server. It provides an API interface similar to Web Service. Users can submit files in a certain format to search engine servers through HTTP requests and generate indexes. You can also make a lookup request and get the result back
ElasricSearch vs. Solr
- Solr is faster when simply searching through existing data
- When indexes are created in real time, Solr causes I/O congestion, resulting in poor query performance. ElasticSearch has an obvious advantage
- Solr becomes less efficient as the amount of data increases, while ElasticSearch doesn’t change significantly
- By converting our search infrastructure from Solr ElasticSearch, you can find ~50 times better search performance
Summary of ElasticSearch vs Solr
- Es is basically out of the box, very simple. Solr installation is a little more complicated
- Solr uses Zookeeper for distributed management, while ElasticSearch provides distributed coordination management
- Solr supports more data formats such as JSON, XML, and CSV, whereas ElasticSearch only supports JSON files
- Solr offers a lot of features, but ElasticSearch itself focuses on core features. Advanced features are provided by third-party add-ons, such as the Kibana graphical interface
- Solr is faster to query, but slower to update indexes (that is, slow to insert and delete), which is used in e-commerce applications with many queries
- ES is fast in index building (slow in query) and real-time query. It is used for Facebook, Sina and other searches
- Solr is a great solution for traditional search applications, but ElasticSearch is better suited for emerging real-time search applications
- Solr is mature and has a much larger and more mature community of users, developers and contributors, whereas ElasticSearch is less developed and maintained, updates are too fast and costs more to learn and use
Inverted index (*)
-
Traditional search forward index full text search: inverted index
-
Each entry in such an index table contains an attribute value and the address of the records that have that attribute value. May I have an Inverted index, because it is not the records that determine the attribute values, but the attribute values that determine the position of the records?
-
Inverted indexes have two different forms of inverted indexes:
- A horizontal reverse index (or reverse archive index) of a record contains a list of documents for each reference word
- A horizontal reverse index (or full reverse index) of a word contains the position of each word in a document
-
As shown in the following example:
The inverted index classifies the contents of the above documents with keywords, which can be used to directly locate the contents of the documents
ElasticSearch installation
- Declaration: JDK1.8, minimum requirements! ElasticSearch client interface tool
- Java development, ElasticSearch version and we after the corresponding Java core JAR package! Version corresponding JDK environment is normal
download
- Official website download address
Install the ES
Windows environment
-
Unpack the
-
Directory file
- Bin Startup file
- Config Configuration file
- Log4j2 Log configuration file
- JVM Options Java VM configuration (1 GB memory by default)
- Elasticsearch. Yml ElasticSearch configuration file (default port 9200 etc)
- Lib Related JAR package
- Logs log
- Modules Function module
- Plugins (*)
Linux environment
-
Gz Decompress the installation package
tar -zxvf ***.tar.gz Copy the code
-
By default, ES does not support IP access. Modify elasticSearch. yml in config
network.host: 192.16883.133. cluster.initial_master_nodes: ["node-1"."node-2"] Copy the code
-
The installation package boot mode requires additional configuration parameters
-
Modify limits on file handles
## modify restriction sudo vi /etc/sysctl.conf ## Check whether it takes effect sudo sysctl -p Copy the code
-
If the maximum number of open files for each process is too small, change the size of the open files
sudo vi /etc/security/limits.conf Copy the code
Add content
* soft nproc 4096 * hard nproc 4096 * soft nofile 65536 * hard nofile 65536 Copy the code
## Check the soft limit size with the command ulimit -Sn #Check the hard limit size with the command ulimit -Hn Copy the code
-
Restart the PC and restart ElasticSearch
-
Start the ES
-
Double-click ElasticSearch. bat to start ElasticSearch
-
The default exposed port is 9200
-
Access the browser 127.0.0.1:9200
Install the visual interface Head
Download address
-
The Node environment is required
-
Head download address
Compile operation
-
A cross-domain problem occurred while accessing 9100, causing a failure to connect to 9200
-
Add the following configuration for elasticSearch. yml
http.cors.enabled: true http.cors.allow-origin: "*" Copy the code
- Restart the ElasticSearch
-
For starters, you can think of ES as a database. You can create indexes (tables), documents (data in tables).
Head sees it as a data presentation tool. All subsequent queries are done in Kibana
Kibana
Kibana profile
- Kibana is an open source analysis and visualization platform for ElasticSearch that allows you to search and view interactive data stored in the ElasticSearch index. With Kibana, you can perform advanced data analysis and presentation through various charts. Kibana makes massive amounts of data easier to understand with a simple, browser-based user interface that allows you to quickly create dashboards that display Elasticsearch queries in real time. Setting up Kibana is very simple. You can install Kibana and start ElasticSearch index monitoring in minutes without coding or additional infrastructure
Kibana installation
download
- Kibana download address
The installation
Windows environment
-
Unpack the
-
Is a standard engineering bin/kibana.bat
Linux environment
-
Unzip kibana 7.6.1 – Linux – x86_64. Tar. Gz
-
Modify vim Kibana.yml
-
CD/usr/local/elk/kibana - 7.6.1 - Linux - x86_64 / bin /#Start the ./kibana --allow-root Copy the code
Start the Kibana
-
Bin/kibana. Bat double-click
-
Access test http://localhost:5601
-
The development tools
-
PostMan
-
curl
-
head
-
Google Chrome plug-in test (support Chinese)
All subsequent operations are performed here
-
ES Core Concepts
An overview of the
- The above content already knows what ES is, and the service of ES has been installed and started, so how does ES store data, what is the data structure, and how to realize search?
The concept of ES
-
The cluster
-
node
-
shard
-
How do nodes and sharding work
-
A cluster has at least one node, and a node is an ES process. A node can have multiple indexes. By default, if you create an index, the index will have 5 primary shards, and each primary shard will have a replica.
-
The figure above shows a cluster with three nodes. You can see that the master shard and the corresponding replication shard are not in the same node, so that even if a node fails, data will not be lost. In effect, a shard is a Lucene index, a directory of files with inverted indexes that are structured so that ES can tell you which documents contain a particular keyword without scanning the entire document
-
Inverted index
Es uses a structure called an inverted index, with the Lucene inverted index as the underlying layer. This structure is suitable for fast full-text searches, where an index consists of all non-repeating lists in a document and, for each word, a list of documents containing it. For example, there are now two documents, each containing the following:
Study every Day, Good Good Up to forever # Study every Day, Good Good Up #Copy the code
To create an inverted index, divide each document into individual words (or terms or tokens) and then create a sorted list of all non-repeating terms. Then list which document each word appears in:
Now we try to search for to Forever by looking at the document that contains each term
-
For another example, if we search for blog posts by blog tags, the inverted index list would look like this:
- · If you want to search for articles with Python tags, it will be much faster to find the data in the inverted index than to find all the raw data. Just look at the tag column and get the relevant article ID. Completely filter out all irrelevant data to improve efficiency
-
Elasticsearch index vs. Lucene index
- In ElasticSearch, the word index (library) is used a lot, this is how the term is used. In ElasticSearch, indexes are split into shards, each of which is a Lucene index. So an ElasticSearch index is made up of multiple Lucene indexes
-
-
-
The index
- That’s the database
- An index is a container of mapping types, and an index in ES is a very large collection of documents. The index stores fields and other Settings for the mapping type. They are then stored on the individual shards
-
type
- A type is a logical container for a document. Like a relational database, a table is a container for rows, and the definitions of fields within a type are called mappings, such as name mapping to a string type
-
The document
- To say that ES is document-oriented means that the smallest unit of index and search data is a document. In ES, documents have several important properties:
- Self-contained, a document that contains both fields and corresponding values, i.e., key: value!
- It can be hierarchical, with a document containing its own document, which is where complex logical entities come from
- Flexible structure, documents do not rely on pre-defined schema, in a relational database requires pre-defined fields to use, in ES, for the field is very flexible, sometimes you can ignore the field, or dynamically add a new field
- Although we can add or omit fields at will, each field type is important. For example, an age field type can be either a string or an integer. Because ES keeps the mapping between fields and types and other Settings, the mapping is specific to each type of each map, which is why in ES, types are sometimes called mapping types, right
- To say that ES is document-oriented means that the smallest unit of index and search data is a document. In ES, documents have several important properties:
-
mapping
MySQL vs. ElasticSearch
Elasticsearch is a document-oriented, relational database compared to ElasticSearch
MySQL | ElasticSearch |
---|---|
Database | Indices |
Tables (tables) | types |
Lines (rows) | document |
Columns | field |
Elasticsearch can have multiple indexes (databases), each index can have multiple types (tables), each type can have multiple documents (rows), and each document can have multiple fields (columns).
Physical design
- Elasticsearch splits each index into shards behind the scenes, and each shard can be moved between different servers in the cluster
Logic design
- An index type contains multiple documents, such as document 1, document 2. When we index a document, we can find it in this order: Index >> Type >> Document ID. By this combination we can index a specific document. Note: ID does not have to be an integer; it is actually a string
9200 is different from 9300
- Port 9300: used for communication between ES nodes
- Port 9200: used by the ES node to communicate with external devices
- 9300 is the TCP port number used for communication between ES clusters. 9200 Indicates the port number of the EXPOSED ES RESTful interface
IK word splitter plug-in
What is the
- Word segmentation: Where a Duan Zhongwen or other keyword, divided into our own information in the search, they’ll put for word segmentation, the database or index to participle in the library data, and then a matching operation, the default of Chinese word segmentation is to as a word, every word such as “I love programming” will be divided into “I”, “love”, “make up”, “cheng”, This obviously does not meet the requirements, so we need to install the Chinese word segmentation IK to solve this problem
- IK provides two word segmentation algorithms: IK_SMART and IK_MAX_word, where IK_SMART is the least segmentation and IK_max_word is the smallest granularity segmentation
IK word divider installed
download
Github download address
Windows installation
- Unzip the downloaded files
- Create a new folder ik under the plugins directory of es
- Place the unzipped files in the IK folder
Linux installation
-
Basically the same as Windows
-
Copy the decompressed folder named IK to the plugins folder
Restart Observation ES
-
See the IK tokenizer plug-in loaded
-
Elasticsearch – the plugin list command
IK word divider tested in Kibana
-
kibana Dev Tools
-
Ik_smart (least sharded)
-
Ik_max_word (finer-grained partition)
-
-
Find problems: Words that need to be put together can be broken up. This kind of personalized word, we need to add to the dictionary of word segmentation
IK word divider adds its own configuration
-
ik/config/IKAnalyzer.cfg.xml
-
Add custom dictionary Touchair and inject into the extension configuration, then restart ES
-
Look at the startup log and see that touchair.dic is loaded. Now test the word segmentation again
-
The test results
-
Before adding a custom dictionary: Touch is split into touch and reach
-
After configuration, you can split it into desired results
-
-
REST Style Description
-
A software architectural style, rather than a standard, provides a set of design principles and constraints. It is mainly used for client and server interaction class software. Software designed in this style can be simpler, more hierarchical, and easier to implement mechanisms such as caching
-
Basic REST commands:
methood url describe PUT Localhost :9200/ index name/type name/document ID Create document (specify document ID) POST Localhost :9200/ Index name/type name Create document (random document ID) POST Localhost :9200/ index name/type name/document ID /_update Modify the document DELETE Localhost :9200/ index name/type name/document ID Delete specified documents GET Localhost :9200/ index name/type name/document ID Query documents by document ID POST Localhost :9200/ index name/type name /_search Query all data
Index basic operations
-
Create an index (POST)
PUT/index name /~ Type name ~/ document ID {request body}Copy the code
While creating the index, a piece of data is inserted
Document mapping
-
Dynamic mapping: In a relational database, you need to create a database and then create a table under that database instance before you can insert data into that table. ElasticSearch does not need to define a Mapping. When a document is written to ElasticSearch, it automatically identifies the document type based on the document field. This mechanism is called dynamic Mapping
-
Static mapping: In ElasticSearch, you can also define a map that contains the fields and types of the document. This is called static mapping
-
Type classification:
-
The value can be text or keyword
Text is segmented by the word splitter; keyword is not segmented
-
Value types: Long, INTEGER, short, byte, double, float, half, scaled,
-
Date type: date
-
Boolean value type: Boolean
-
Binary type: binary
-
Array type: array
-
The complex type
- Geographic location types (Geo Datatypes)
- Geo-point datatype: Geo_point is used for latitude and longitude coordinates
- Geo-shape datatype: Geo_shape is used for complex shapes similar to polygons
- Specialised datatypes
- Pv4 type (IPv4 datatype) : IP Indicates an IPv4 address
- Completion type: Completion provides automatic Completion suggestions
- Token Count type: Used to count the index number of sub-token fields. This value is always increased and does not decrease due to filtering conditions
- Mapper-number3 type: The plugin allows the hash of index to be calculated using _number3
- Attachment datatype: mapper-attachments plug-in, supporting _attachments indexes, such as Microsoft Office format, Open Document format, ePub, HTML, etc
- Geographic location types (Geo Datatypes)
-
-
Create and specify the field type (POST)
You can also specify the toggle type
-
GET this rule
-
View the default information
PUT /test3/_doc/1#_doc is a display of the default type. {can be omitted."name":"touchair-3"."age":"19"."birth":"2020-09-16" } Copy the code
To view
If you do not specify a field type for your document, ES will give you the default field type!
-
Extension: You can GET a lot of current information about ES by using the GET _cat command
- GET _cat/health Displays health information
-
GET _cat/indices? V View all
-
Modifying data (POST/PUT)
-
PUT /test3/_doc/1 { "name":"touchair-3-put"."age":"20"."birth":"2020-09-15" } POST /test3/_doc/1/_update { "doc": {"name":"touchair-3-post"}}Copy the code
-
PUT cover type
-
POST updates
-
The results view
-
-
DELETE determines whether to DELETE an index or document record based on the requested URL
Basic operations of the document (*)
ElasticSearch version control
-
The version field
-
Why version control CAS lock free
In order to ensure the accuracy of data under multi-threaded operation
-
Pessimistic locks and optimistic locks
- Pessimistic locking: Shielding all operations that might violate data accuracy, assuming that concurrency conflicts are certain
- Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur
-
Internal and external versioning
- Internal version: _version increases automatically. After data is modified, version is automatically increased by 1
- External version: To keep version consistent with the value of external version control, use version_type=external to check whether the current version value of the data is less than the version value in the request
Simple operation
Adding test Data
PUT /touchair/user/1
{
"name":"z3"."age": 11."desc":"This is z3."."tags": ["Geek"."Old straight man.".The Overtime Dog]
}
PUT /touchair/user/2
{
"name":"l4"."age": 12."desc":"This is l4."."tags": ["Struggle force"."Men who cheat on women's affections."."Hangzhou"]
}
PUT /touchair/user/3
{
"name":"w5"."age": 30."desc":"This is W5."."tags": ["Handsome"."Rush toward street"."Travel"]
}
PUT /touchair/user/4
{
"name":"w55"."age": 31."desc":"This is W55"."tags": ["Pretty girl"."Go to the movies"."Travel"]
}
PUT /touchair/user/5
{
"name":"Learning Java"."age": 32."desc":"Here's learning Java."."tags": ["Phishing"."Literacy"."Write"]
}
PUT /touchair/user/6
{
"name":"Learning Node. Js"."age": 33."desc":"Here's learning Node.js"."tags": ["Class"."Sleep"."Play video games"]}Copy the code
GET data (GET)
GET touchair/user/1
Copy the code
Update data (POST)
POST touchair/user/2/_update
{
"doc": {"name":"l4-2"
}
}
GET touchair/user/2
Copy the code
Simple query
-
Conditions of the query
GET touchair/user/_search? q=name:w5Copy the code
A complex operation
Complex query SELECT (sort, paging, highlight, fuzzy query, precise query!)
-
The attribute _score in hits represents the matching degree. The higher the matching degree, the higher the score
-
Hit:
- Index and document information
- The total number of query results
- Query the specific document
- I can go through them all
- You can use score to figure out who is more qualified
Match the match
GET touchair/user/_search
{
"query": {
"match": {
"name": "w5"}}}Copy the code
You don’t need that many result return fields_source
GET touchair/user/_search
{
"query": {
"match": {
"name": "Learning"}},"_source": ["name"."desc"]}Copy the code
The sorting
In reverse chronological order
GET touchair/user/_search
{
"query": {
"match": {
"name": "Learning"}},"sort": [{"age": {
"order": "desc"}}}]Copy the code
paging
From size is equivalent to two parameters of the MySQL limit statement
GET touchair/user/_search
{
"query": {
"match": {
"name": "Learning"}},"sort": [{"age": {
"order": "desc"}}]."from": 0."size": 1
}
Copy the code
Matching conditions
- Precise query
must
, equivalent to the MySQL and operation
GET touchair/user/_search
{
"query": {
"bool": {
"must": [{"match": {
"name": "Learning"}}, {"match": {
"age": "32"}}]}}}Copy the code
-
Should is equivalent to the MySQL or operation
GET touchair/user/_search { "query": { "bool": { "should": [{"match": { "name": "Learning"}}, {"match": { "age": "11"}}]}}}Copy the code
-
Must_not is equivalent to the MySQL NOT operation
GET touchair/user/_search { "query": { "bool": { "must_not": [{"match": { "age": 33}}]}}}Copy the code
Matching data filteringfilter
- The range interval
- Gte is greater than or equal to
- Lte less than or equal to
- Gt is greater than the
- Lt is less than
GET touchair/user/_search
{
"query": {
"bool": {
"must": [{"match": {
"name":"Learning"}}]."filter": {
"range": {
"age": {
"lte": 32
}
}
}
}
}
}
Copy the code
Query match with multiple conditions
-
Match is equivalent to like
GET touchair/user/_search { "query": { "match": { "tags": "Man technique"}}}Copy the code
Precise queryterm
-
Equivalent to equals
-
Queries perform precise lookups directly by inverting the terms specified in the index
-
keyword
Type fields can only be looked up precisely
About participles:
Trem, direct query precisely
Match, which will be parsed using a word splitter (analyze documents first, query the documents that have been parsed)
Two types of
The text type is split by the word divider
The keyword is not split and can only be looked up precisely
Highlighting the query
-
highlight
GET touchair/user/_search { "query": { "match": { "name": "Learning"}},"highlight": { "fields": { "name": {}}}}Copy the code
-
Custom highlighting surrounds the label
GET touchair/user/_search { "query": { "match": { "name": "Learning"}},"highlight": { "pre_tags": "<p class='key',style='color:red'>"."post_tags": "</p>"."fields": { "name": {}}}}Copy the code
Term is different from Match
- Term queries do not perform word segmentation on fields and use Equals instead.
- Match will perform a word segmentation query (Like) based on the word segmentation of the field.
ES integration SpringBoot
The official documentation
The document address
- ElasticSearch 7.6 client documentation
Maven rely on
-
pom.xml
-
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.6.2</version> </dependency> Copy the code
Initialize the
Create a project
- Create a SpringBoot project
- Select dependencies – most notably ElasticSearch in NoSQL
API call test
Operation of index
Create indexes
Check whether the index exists
To obtain the index
Remove the index
Document the CRUD
Create a document
Get the document
Update the document
Delete the document
Batch Insert documents
Document Query (*)
-
SearchRequest SearchRequest SearchSourceBuilder conditional construction highlighter highlighting matchAllQuery matches all termQuery() exact lookupsCopy the code
-
Test class code
package com.touchair.elk; import cn.hutool.json.JSONUtil; import com.touchair.elk.pojo.User; import org.assertj.core.util.Lists; import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.action.delete.DeleteRequest; import org.elasticsearch.action.delete.DeleteResponse; import org.elasticsearch.action.get.GetRequest; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.action.support.master.AcknowledgedResponse; import org.elasticsearch.action.update.UpdateRequest; import org.elasticsearch.action.update.UpdateResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.client.indices.CreateIndexRequest; import org.elasticsearch.client.indices.CreateIndexResponse; import org.elasticsearch.client.indices.GetIndexRequest; import org.elasticsearch.client.indices.GetIndexResponse; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.common.xcontent.XContentType; import org.elasticsearch.index.query.MatchAllQueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.boot.test.context.SpringBootTest; import javax.annotation.Resource; import java.io.IOException; import java.util.ArrayList; import java.util.concurrent.TimeUnit; @SpringBootTest class ElkApplicationTests { public static final String INDEX_NAME = "java_touchair_index"; @Resource @Qualifier("restHighLevelClient") private RestHighLevelClient restHighLevelClient; /** * test create index **@throws IOException */ @Test void testCreateIndex(a) throws IOException { // A request to create an index CreateIndexRequest indexRequest = new CreateIndexRequest("java_touchair_index"); // The client executes the IndicesClient request and obtains the response CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT); System.out.println(createIndexResponse.toString()); } /** * test get index **@throws IOException */ @Test void testGetIndex(a) throws IOException { GetIndexRequest getIndexRequest = new GetIndexRequest("java_touchair_index"); boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT); if (exists) { GetIndexResponse getIndexResponse = restHighLevelClient.indices().get(getIndexRequest, RequestOptions.DEFAULT); System.out.println(getIndexResponse); } else { System.out.println(Index does not exist); }}/** * test delete index **@throws IOException */ @Test void testDeleteIndex(a) throws IOException { DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("test2"); AcknowledgedResponse acknowledgedResponse = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT); System.out.println(acknowledgedResponse.isAcknowledged()); } /** * Test document creation **@throws IOException */ @Test void testAddDocument(a) throws IOException { // Create an object User user = new User("java".23); // Create the request IndexRequest indexRequest = new IndexRequest("java_touchair_index"); // Rule put/javA_touchair_index /_doc/1 indexRequest.id("1"); indexRequest.timeout(TimeValue.timeValueSeconds(1)); // Put the data into the request indexRequest.source(JSONUtil.toJsonPrettyStr(user), XContentType.JSON); // The client sends a request to obtain IndexResponse indexResponse = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT); System.out.println(indexResponse.toString()); System.out.println(indexResponse.status()); } /** * Test to obtain the document **@throws IOException */ @Test void testGetDocument(a) throws IOException { // Check whether the document has get /index/_doc/1 GetRequest getRequest = new GetRequest(INDEX_NAME, "1"); // // does not get the context of the returned _source anymore // getRequest.fetchSourceContext(new FetchSourceContext(false)); // getRequest.storedFields("_none_"); boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT); if (exists) { GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT); System.out.println(getResponse.toString()); // Print the contents of the document // Everything returned is exactly the same as the command line result System.out.println(getResponse.getSourceAsString()); } else { System.out.println("Document does not exist"); }}/** * Test update document information **@throws IOException */ @Test void testUpdateDocument(a) throws IOException { UpdateRequest updateRequest = new UpdateRequest(INDEX_NAME, "1"); updateRequest.timeout("1s"); User user = new User("ES Search Engine".24); updateRequest.doc(JSONUtil.toJsonPrettyStr(user), XContentType.JSON); UpdateResponse updateResponse = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT); System.out.println(updateResponse.status()); } /** * Test delete document **@throws IOException */ @Test void testDeleteDocument(a) throws IOException { DeleteRequest deleteRequest = new DeleteRequest(INDEX_NAME, "1"); deleteRequest.timeout("1s"); DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT); System.out.println(deleteResponse.getResult()); } /** * In particular, real projects are usually batch insert data **@throws IOException */ @Test void testBulkRequest(a) throws IOException { BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("10s"); ArrayList<User> userList = Lists.newArrayList(); userList.add(new User("Java".11)); userList.add(new User("javaScript".12)); userList.add(new User("Vue".13)); userList.add(new User("Mysql".14)); userList.add(new User("Docker".15)); userList.add(new User("MongoDB".16)); userList.add(new User("Redis".17)); userList.add(new User("Tomcat".18)); for (int i = 0; i < userList.size(); i++) { // Batch update and batch delete only need to modify the corresponding request here bulkRequest.add(new IndexRequest(INDEX_NAME) .id("" + i + 1) .source(JSONUtil.toJsonPrettyStr(userList.get(i)), XContentType.JSON)); } BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); System.out.println((bulkResponse.hasFailures())); // Success if false is returned } /** * query * SearchRequest SearchRequest * searchSourceBuilder conditional construction * highlighter highlighting * matchAllQuery matches all * termQuery() exact lookup **@throws IOException */ @Test void testSearch(a) throws IOException { SearchRequest searchRequest = new SearchRequest(INDEX_NAME); // Build the search criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // Query criteria, which can be quickly queried using the QueryBuilders tool / / QueryBuilders matchAllQuery match all / / QueryBuilders termQuery () precise search // TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", 11); MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery(); searchSourceBuilder.query(matchAllQueryBuilder); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // searchSourceBuilder.highlighter(); // searchSourceBuilder.size(); // searchSourceBuilder.from(); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); for(SearchHit searchHits : searchResponse.getHits().getHits()) { System.out.println(searchHits.getSourceAsMap()); }}}Copy the code
Imitation mall search
-
Crawl web page data
-
Paging search
-
The highlighted
-
rendering
Distributed Log Collection
Principles of ELK Distributed Log Collection (*)
-
Install the Logstash log collection system plug-in on each server cluster node
-
Each server node enters the log into the Logstash
-
Logstash formats this log in JSON format, creates a different index for each day, and outputs it to ElasticSearc
-
Browser installation Use Kibana to query log information
Environmental installation
- 1. Install ElasticSearch
- 2. Install Logstash
- 3. Install Kibana
Logstash
introduce
-
Logstash is a completely open source tool that collects, filters, analyzes your logs, supports a wide range of data capture methods, and stores them for later use (like searching). Speaking of searching, Logstash comes with a Web interface that searches and displays all logs. The client is installed on the host that needs to collect logs. The server filters and modifies the received node logs and sends them to ElasticSearch
-
Core process:
- Logstash event processing has three phases: input–>filters–>outputs
- Is a tool for receiving, processing and forwarding logs
- Supports system logs, Webserver logs, error logs, application logs, and all types of logs that can be thrown
Set up the Logstash environment
- Download address
- Unpack the
Logstash test
-
Enter the elsaticSearch log into the Logstash
-
Go to the config directory of the logstash directory and create touchair.conf
-
Add the following and save it
Input {# read the log information from the file to the console file{path =>"/ usr/local/elk/elasticsearch - 7.6.1 / logs/elasticsearch log"Codec =>"json" type => "elasticsearch" start_position =>"beginning"}} output {# standard output#stdout{}The output is formatted and the Ruby library is used to parse the logstdout { codec => rubydebug } } Copy the code
-
Start the logstash and look in the bin directory of the console logstash
./logstash -f .. /config/touchair.confCopy the code
-
Output logs to ES
-
Create and modify the touchair.es. Conf file
Input {# read the log information from the file to the console file{path =>"/ usr/local/elk/elasticsearch - 7.6.1 / logs/elasticsearch log"Codec =>"json" type => "elasticsearch" start_position =>"beginning"}} output {# standard output#stdout{}The output is formatted and the Ruby library is used to parse the logstdout { codec => rubydebug } elasticsearch { hosts => ["192.168.83.133:9200"] index => "es-%{+YYYY.MM.dd}"}}Copy the code
-
Start the logstash
./logstash -f .. /config/touchair.es.confCopy the code
Logstash integration Springboot
A single
-
# tcp -> Logstash -> Elasticsearch pipeline. input { tcp { mode => "server" host => "0.0.0.0" port => 4560 codec => json_lines } } output { elasticsearch { hosts => ["192.168.83.133:9200"] index => "robot-java-%{+YYYY.MM.dd}"}}Copy the code
Multiple lines of run logs help locate faults
input {
tcp {
mode => "server"
host => "0.0.0.0"
port => 4560
codec => multiline{
pattern => "^ \ ["
negate => false
what => "next"
}
}
}
filter {
json {
source => "message"
}
mutate {
add_field => {
"language"= >"%{[type]}"
}
}
}
output{
if [language]=="java" {
elasticsearch {
hosts => ["172.17.0.8:9200"]
index => "robot-java-%{+YYYY.MM.dd}"}}if [language]=="ros" {
elasticsearch {
hosts => ["172.17.0.8:9200"]
index => "robot-ros-%{+YYYY.MM.dd}"}}if [language]=="rec" {
elasticsearch {
hosts => ["172.17.0.8:9200"]
index => "robot-rec-%{+YYYY.MM.dd}"}}}Copy the code
ELK docker deployment
Install the ElasticSearch
Pull the mirror
Docker pull elasticsearch: 7.6.1Copy the code
Run the container
- Run the command to create the startup container:
Docker run -d --name es -p 9200:9200 -p 9300:9300 \ -e "discovery. Type =single-node" ElasticSearch :7.6.1Copy the code
- Copy configuration files and data directories for mounting
docker cp es:/usr/share/elasticsearch/config/ /var/elk/elasticsearch/config
docker cp es:/usr/share/elasticsearch/data/ /var/elk/elasticsearch/data
Copy the code
-
Set to allow cross-domain access
-
vim /var/elk/elasticsearch/config/elasticsearch.yml #Add these two rows http.cors.enabled: true http.cors.allow-origin: "*" Copy the code
-
-
Destroy the container and run again in mount mode
#The destruction
docker rm -f es
#Mount the configuration filedocker run -d --name es -p 9200:9200 -p 9300:9300 \ -v /var/elk/elasticsearch/config/:/usr/share/elasticsearch/config/ \ - v/var/elk/elasticsearch/data /, / usr/share/elasticsearch/data / \ - e discovery. Type = "single - node" \ elasticsearch: 7.6.1Copy the code
-
Access port 9200 of the host IP address and check whether the host is successfully started
Install Kibana
Pull the mirror
Docker pull kibana: 7.6.1Copy the code
Run the container
-
Run the container first
Docker run -d --name kibana -P 5601:5601 kibana:7.6.1Copy the code
-
Copy the configuration file and mount it later
#copy docker cp kibana:/usr/share/kibana/config/ /var/elk/kibana/config #View the internal IP address of the ES container docker exec -it es ifconfig #Modify the configuration vim kibana.yml Copy the code
-
Mount the run
#Destroy the container first docker rm -f kibana #Run the containerDocker run - d - name kibana -p 5601: \ 5601 - v/var/elk/kibana/config: / usr/share/kibana/config \ kibana: 7.6.1Copy the code
-
Host IP :5601, view the Kibana graphical interface
Install the LogStash
Pull the mirror
Docker pull logstash: 7.6.1Copy the code
Run the container
-
Run the container first
Docker run --name logstash -d -p 4560:4560 -p 9600:9600 logstash:7.6.1Copy the code
-
Copy the configuration file and mount it later
docker cp logstash:/usr/share/logstash/config /var/elk/logstash/config Copy the code
-
Add a customized CONF file
input { tcp { mode => "server" host => "0.0.0.0" port => 4560 codec => multiline{ pattern => "^ \ [" negate => false what => "next" } } } filter { json { source => "message" } mutate { add_field => { "language"= >"%{[type]}" } } } output{ if [language]=="java" { elasticsearch { hosts => ["172.17.0.8:9200"] index => "robot-java-%{+YYYY.MM.dd}"}}if [language]=="ros" { elasticsearch { hosts => ["172.17.0.8:9200"] index => "robot-ros-%{+YYYY.MM.dd}"}}if [language]=="rec" { elasticsearch { hosts => ["172.17.0.8:9200"] index => "robot-rec-%{+YYYY.MM.dd}"}}}Copy the code
-
Modify the configuration file logstash. Yml
vim logstash.yml Copy the code
-
-
Mount the run
#Remove the container docker rm -f logstash #Restart the containerdocker run --name logstash -d -p 4560:4560 -p 9600:9600 \ -v /var/elk/logstash/config:/usr/share/logstash/config \ Logstash: 7.6.1 \ -f/usr/share/logstash/config/robot. ConfCopy the code
Yml and logstash. Yml and the ES service address in the customized conf file, and restart Kibana and Logstash