Hello everyone, I am xiao CAI, a desire to do CAI Not CAI xiao CAI in the Internet industry. Soft but just, soft praise, white piao just! Ghost ~ remember to give me a three – even oh!
This article describes how to use ElasticSearch
Refer to it if necessary
If it is helpful, do not forget the Sunday
Wechat public number has been opened, xiao CAI Liang, did not pay attention to the students remember to pay attention to oh!
ELK is a free open source log analysis architecture stack, which contains three basic components: ElasticSearch, Logstash, and Kibana. ELK is not only used for log analysis in real-world development, it can support any other data search, analysis, and collection scenario, of which log analysis and collection are more representative.
What is ElasticSearch?
Introduction to the
ElasticSearch is simply a search framework. We are familiar with the term “search”, when we enter a keyword, it returns results containing all information about that keyword.
We usually use the most is database search:
SELECT * FROM USE WHERE NAME LIKE %Side dishes%
Copy the code
However, there are many disadvantages in using database to search, such as:
- Storage problem: when the amount of data is large, it is necessary to divide the database into tables.
- Performance issues: This parameter is used when the data volume is too large
LIKE
Hundreds of millions of data are scanned line by line, and performance is severely affected. - Word segmentation: when we search the game computer, we will only return the same data as the keyword. If we search the game computer, we will not return any data.
Therefore, ElasticSearch was created based on the above problems. It is developed using Java, based on Lucene, distributed, Restful interactive near real-time search platform framework. Its advantages are as follows:
- Distributed search engine and data analysis engine
- Full text retrieval, structured retrieval and data analysis
- Processing massive data in near real time
Lucene is introduced
Lucene is a powerful search library that would have been complicated if we had developed directly on top of Lucene. ElasticSearch is developed based on Lucene and encapsulates many Lucene basic functions. It provides easy-to-use RestFul apis and clients in many languages.
ElasticSearch core concepts
- Near Realtime (NRT) Near Realtime
- When writing data, it will be searched after 1 second because of internal word segmentation and index introduction
- Es searches and analyses data in seconds
- Cluster Cluster
A cluster of machines containing one or more instances of ES running. There is usually one ES instance per machine. Under the same network, multiple ES instances with the same set name automatically form clusters and automatically balance fragments. The default cluster name is ElasticSearch.
- The Node Node
Each ES instance is called a node. Node names are automatically assigned or can be manually configured.
- The Index Index
Contains a bunch of document data with a similar structure. Index creation rules:
-
Lower case letters only
-
Cannot contain \, /, *,? , “, <, >, special symbols such as |, #, and Spaces
-
As of version 7.0, colons are no longer included
-
It cannot start with **-, _, or +**
-
Cannot exceed 255 bytes (note that it is a byte, so multi-byte characters count toward the 255 limit)
- The Document Document
The smallest data unit in ES. A document is like a record in a database. It is usually displayed in JSON format. Multiple documents are stored in an Index.
- The Field Field
Just like Columns in a database, define the fields that each document should have.
- Type Type
Each index can have one or more types. Type is a logical data category in index, and document under type has the same field.
Note: prior to version 6.0, there was a concept of type (type), which corresponds to the table of a relational database. This tutorial is all typy _doc.
- Shard Shard
If the index data is too large, the data in the index is divided into multiple shards and distributed on each server. It supports massive data and high concurrency, improves performance and throughput, and makes full use of cpus on multiple machines.
- A copy of the up
In a distributed environment, any machine may break down at any time. If the index is down, no shards are found, so the index cannot be searched. Therefore, in order to ensure data security, we will backup each index fragment and store it on another machine. The ES cluster can still be searched when a few machines go down.
The shards that can normally provide query and insertion are called primary shards, and the other shards are called replica shards.
Compared to the database
Relational database | Non-relational database (Elasticsearch) |
---|---|
The databaseDatabase |
The indexIndex |
tableTable |
The indexIndex (formerlyType ) |
The data lineRow |
The documentDocument |
Data columnColumn |
fieldField |
The constraintSchema |
mappingMapping |
preparation
Demo environment: Windows
ElasticSearch installation
Select * from ElasticSearch; select * from ElasticSearch;
- Bin: script directory, including executable scripts such as start and stop scripts
- Config: indicates the directory of the configuration file
- Data: indicates the index directory
- Logs: log directory
- Modules: Module directory, which contains es functional modules
- Plugins: Plugins directory, es supports the plugin mechanism
After the configuration file is ready, double-click elasticSearch. bat to start and go to http://localhost:9200.
Kibana installation
Go to the download address to download and decompress Kibana, modify the configuration file, double-click Kibana. bat to start, then visit http://localhost:5601/, the following interface is successful:
Go to the Kibana console and enter GET/View ElasticSearch information. The following dialog box is displayed.
ElasticSearch is now ready to go!
ES to speak
What is the
May I have an Inverted Index? Index is the equivalent of a table in a database. ElasticSearch indexes all the fields, and after processing, writes an Inverted Index. When looking for data, look up the index directly. Therefore, the top level unit of ElasticSearch data management is called an Index. Note: The name of each Index must be lowercase.
- Create a test index:
PUT /${index}
- delete
test
Index:
Syntax: DELETE /${index}
Delete support for other syntax:
1. DELETE /test
2. DELETE /test1,test2
3. DELETE /test_*
4. DELETE /_all
Copy the code
What is the Document
First of all, employees and department objects are as follows:
public class Employee{
private String id;
private String name;
private String deptId;
}
public class Department{
privateString id;privateString the deptName.privateString the describe; }Copy the code
If we use relational data to store data, it should be an employee table and a department table. If we want to bring department information when querying employees, we have to use relational query.
But in ES, it is document-oriented, and the data structure stored in the document is consistent with the object. An object can be directly saved as a Document, which is Document in ES. Document is represented in JSON data format, as shown in the following example:
{
"id":"1"."name":"Dishes"."department": {"id":"1"."deptName":"Brick moving Department"."describe":"Try to move every brick."}}Copy the code
Next we demonstrate the basic add, delete, change and query operations in the Employee index:
Create employee information:
Access to employee information:
Modify employee information:
The following is the substitution, with all the information
Local update operation:
The syntax for local update operations is POST /{index}/_update/{id}, where the information to be updated needs to be placed in doc
Delete employee information:
Field information
{
"_index" : "employee"."_type" : "_doc"."_id" : "1"."_version" : 4."_seq_no" : 7."_primary_term" : 1."found" : true."_source" : {
"id" : "1"."name" : "Dishes"."department" : {
"id" : "1"."deptName" : "Brick moving Department"."describe" : "Try to move every brick."}}}Copy the code
* * * * * * * * * * * * * * * * * * * * * * * * *
-
_index: indicates the index name of the document
-
_type: category. In ES9 only this field is deleted, so don’t worry about it; the default is _doc
-
_ID: A unique identifier for a document, similar to the primary key ID in a table, that can be used to identify and define a document. It can be generated manually or automatically
- Manual generation:
PUT /employee/_doc/1 { "id":"1"."name":"Dishes"."department": {"id":"1"."deptName":"Brick moving Department"."describe":"Try to move every brick."}}Copy the code
- Automatic generation:
Note: This is a POST request that will automatically generate a 20-character ID for us
POST /employee/_doc { "id":"3"."name":"Wang"."department": {"id":"1"."deptName":"Brick moving Department"."describe":"Try to move every brick."}}Copy the code
-
_version: indicates the version number
The version number here is increased by 1 in full replacement, partial update and deletion operations. The version package of employee information with ID 1 is 4, indicating that this record has been updated for four times.
- _seq_no: indicates the serial number
Similar to version, when data changes, the value is incremented by 1
- _source: All fields and values when inserting data
Instead of returning all fields, use the following statement:
GET /employee/_doc/1? _source_includes=id,name
Of course, you can not only use _source_includes, but also _source_excludes.
Optimistic locking mechanism
While studying Java concurrency, we learned about the CAS optimistic locking mechanism, and we can also use optimistic locking in ES.
_seq_no = _seq_no = _seq_no = _seq_no = _seq_no = _seq_no = _seq_no = _seq_no
Let’s start by creating an employee record:
At this point, the version number is undoubtedly 1 and _seq_no is 10
Then we delete this employee information:
We then recreate the same employee information:
You can see that the version number is now 3 and _seq_no is now 14.
This is because ES adopts the delayed deletion policy internally. If a data is deleted, all shards and copies must be deleted immediately, which puts too much pressure on ES cluster.
Perform concurrency control:
- Step 1: Let’s check that the current _seq_NO is 17
Statement: PUT /employee/_doc/5? if_seq_no=17&if_primary_term=1
As you can see, both _version and _seq_no are added with 1 after the update
If the _seq_NO version does not match:
An error will be reported!
The batch operation
Batch Query (_mGET)
All of the above statements specify a single query ID. If we want to query all the data under the current index, we should use the following statement:
GET /employee/_mget
{
"docs": [{"_id" : 1
},
{
"_id" : 5}}]Copy the code
If we want to query ids under different indexes simultaneously, we should use the following statement:
GET /_mget
{
"docs": [{"_index" : "employee"."_id" : 1
},
{
"_index" : "employee"."_id" : 5}}]Copy the code
Bulk increase, Deletion and Modification
Grammar:
POST /_bulk
{"action": {"metadata"}}
{"data"}
Copy the code
Example:
Note:
-
Delete: Deletes a document with only 1 JSON string
-
Create: Creates a document, equivalent to PUT /index/type/id/_create
-
Index: A normal PUT operation that can create or replace a document in full
-
Update: Updates a document, performing a partial update
Each operation does not affect each other. Failure information is displayed for the row that fails to perform the operation
The BULD operation request should not be too large at one time. Otherwise, it is easy to squeeze into the memory and the performance will deteriorate
Integration with development
ElasticSearch is Java based, and it is very convenient to use ElasticSearch in your development.
The first is introducing dependencies
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.9. 0</version>
<exclusions>
<exclusion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.9. 0</version>
</dependency>
Copy the code
Get employee information examples
/ * * the OUTPUT: {" id ":" 1 ", "name" : "dishes ah", "department" : {" id ":" 1 ", "the deptName" : "move brick department", "describe" : "good efforts to move each brick"}} * /
Copy the code
If you’re not satisfied with just getting a source from getResponse, don’t worry! We can also get the following information from getResponse:
As we found in the query document above, if we need to get information from ES, we first get the client connection, then build the request, and finally execute the result. I don’t know if you are familiar with this series of operations, we used JDBC, also need this series of operations, and then Hibernate and Mybatis appeared, greatly simplify our code volume. Similarly, ES can be a perfect fit for Spring framework development.
The host for elasticSearch needs to be defined in the application.yaml file:
server:
port: 8081
spring:
application:
name: search-service
elasticsearch:
address: 127.0. 01.: 9200 # Separate multiple nodes with commas
Copy the code
We then need to register RestHighLevelClient in Spring
Execute the query
We recognize the use of _source_includes and _source_excludes in the example above, and it’s also supported in Java:
And ES also supports asynchronous queries in Java:
Implement new
Here we use JSON to pass the parameters we need to add, but there are other ways:
// Mode 1: json
Map<String, String> insertInfo = new HashMap<>();
insertInfo.put("id"."8");
insertInfo.put("name"."Wang");
request.source(JSON.toJSONString(insertInfo), XContentType.JSON);
// Method 2: map
Map<String, String> insertInfo = new HashMap<>();
insertInfo.put("id"."8");
insertInfo.put("name"."Wang");
request.source(insertInfo);
// Method 3: XContentBuilder
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
builder.field("id"."8");
builder.field("name"."Wang");
}
builder.endObject();
request.source(builder);
// Approach 4: Build directly
request.source("id"."8"."name"."Wang");
Copy the code
Of course, not only new support for asynchronous operation, update also support asynchronous operation:
Implement changes
Asynchronous support:
To delete
Asynchronous support:
Perform batch
The Mapping is introduced
What is Mapping: a data structure and configuration for _doc in index that is set up automatically or manually
Create table EMPLOYEE (id, name); create table employee (id, name);
create table website(
id varchar(8),
name varchar(8));Copy the code
We insert data into the employee index:
PUT /employee/_doc/1
{
"id":"1"."name":"Dishes"
}
Copy the code
We don’t need to create a field. We don’t even need to manually create the Employee index. It’s just a statement!
This is because there is Dynamic Mapping in ES, which will automatically establish index and corresponding Mapping for us. The Mapping includes the corresponding data type of each field, and how to divide words and other Settings.
GET /{index}/_mapping
Core data types
Dynamic prediction type
JSON datatype | ElasticSearch datatype |
---|---|
true or false | boolean |
123 | long |
123.45 | double |
2019-01-01 | date |
“test” | text/keyword |
The custom
After creating the index, we can manually create the mapping:
Syntax: PUT ${index}/_mapping
PUT department/_mapping
{
"properties": {
"id": {
"type": "text"
},
"description": {
"type": "text"."analyzer":"english"."search_analyzer":"english"}}}Copy the code
Where you can specify a toggle through the Analyzer property. The analyzer specified above means that English is used for both indexing and searching. If you want to define a separate word analyzer for searching, you can use the search_Analyzer property.
Note: Date types do not support participles
“The END”
That’s how ElasticSearch gets started! There’s a lot more to ElasticSearch than that, and I’ll take some time to sort out the details of ElasticSearch. The road is long, xiao CAI seeks together with you!
Today you work harder, tomorrow you will be able to say less words!
I am xiao CAI, a man who studies with you. 💋
Wechat public number has been opened, xiao CAI Liang, did not pay attention to the students remember to pay attention to oh!