“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

Basic principle of ElasticSearch document score calculation

1) Boolean model

Based on the user’s query criteria, the doc containing the specified term is filtered first

Query "hello world" ‐‐> hello/world/hello & world bool ‐‐> must/must not/should ‐> filter ‐‐> include/not include/may include doc ‐‐> ‐‐> To improve performance by reducing the number of doc to be computed laterCopy the code

2) Relevance Score algorithm: Simply put, relevance score algorithm is used to calculate the degree of relevance between the text in an index and the search text

Elasticsearch uses term Frequency/Inverse Document Frequency algorithm (TF/IDF algorithm term Frequency) : How many times each term in the search text appears in the field text, and the more times it appears, the more relevant it is

Hello world doc1: Hello you, and world is very good doc2: Hello, how are youCopy the code

Inverse Document frequency: How many times each term in the search text appears in all documents in the entire index, the more times it occurs, the less relevant it is

Hello world doc1: Hello, Tuling is very good6 doc2: Hi world, how are youCopy the code

For example, if you have 10,000 documents in index, the word hello appears 1,000 times in all of the documents; The word “world” appears 100 times in all of the documents: field-length norm: the longer the Field, the less relevant the search request: Hello world

Doc1: {" title ":" hello, article "and" content ":"... N a word "} doc2: {" title ":" my article ", "content" : "... N words, hi world"}Copy the code

Hello world occurs the same number of times throughout the index doc1 is more relevant, title field is shorter **2, How is _score calculated on a document **

 GET /es_db/_doc/1/_explain 
 { 
 "query": { 
 "match": { 
 "remark": "java developer" 
 } 
 } 
 } 
Copy the code

Hello world –> es = hello world –> es = hello world –> es = hello world –> es = Hello world –> es = Hello world –> es Query vector Hello term is given a score of 2 based on all doc and world term is given a score of 5 based on all doc [2, 5]

query vector

Doc vector, three doc’s, one containing one term, one containing another term, and one containing two terms

Three doc

Doc2: contains hello –> [2, 0] doc2: contains world –> [0, 5] doc3: contains hello, world –> [2, 5]

It will calculate a score for each term for each doc, a score for hello, a score for world, and all of them

The fractions of term form a DOC vector

Draw in a graph, take the radians of each doc vector to query vector, and give the total fraction of each doc to multiple terms

Each doc Vector calculates the radian of the Query vector, and finally gives the total score of a DOC relative to multiple terms in query based on this radian

The greater the radian, the fraction end; The smaller the radian, the higher the score

If it is multiple terms, it is calculated by linear algebra and cannot be represented by graphs

Key parameters of es production cluster deployment specifically tailored to the split brain problem of production cluster

What is ** cluster split brain? ** The so-called split brain problem is that different nodes in the same cluster have different understandings of the state of the cluster. For example, there are two masters in the cluster. If a cluster is divided into two pieces due to network failure, each piece has multiple nodes and one master. There are two masters in the cluster.

But because the master is a very important player in the cluster, controlling the maintenance of the cluster state and the allocation of shards,

Therefore, if there are two masters, it may cause data corruption.

Such as:



Node 1 is elected master at startup and saves the master shard mark as 0P, while node 2 saves the replication shard mark as

0R Now, what happens if communication between two nodes breaks? Because of a network problem or just because of it

It is possible that a node in the.



Each node believes the other is dead. Node 1 does not need to do anything because it is already elected as the primary node. but

Node 2 automatically elects itself as master because it believes that part of the cluster has no master.

In the ElasticSearch cluster, the master node is responsible for distributing the fragments evenly across the nodes. Node 2 holds

Copies shard, but it believes the master node is unavailable. So it automatically promotes the replication node as the master node.



Now our cluster is in an inconsistent state. An index request on node 1 assigns index data to the primary node, while a request on node 2 assigns index data to the shard. In this case, the two pieces of shard data are separated and it is difficult to reorder them without a full reindex. In the worse case of an index client that is not aware of the cluster (for example, using the REST interface), the problem is so transparent that the index request will still complete successfully every time no matter which node is hit. The problem becomes apparent only when the data is searched: the results vary depending on which node the search request hits.

So what that parameter does is tell ES not to vote for a master until there are enough candidate master nodes, or not to vote for a master. This parameter must be set to the quorum number of master candidates in the cluster, that is, the majority. For quorum, the number of master candidates / 2 + 1.

For example, if we have 10 nodes, all capable of maintaining data, and all master candidates, quorum is 10/2 + 1 = 6.

If we have three master candidates and 100 data nodes, quorum is 3/2 + 1 = 2

If we have two nodes, both of which can be master candidates, then quorum is 2/2 + 1 = 2. This is a problem, because if a node fails, there will be only one master candidate left, which will not be able to satisfy quorum, and no new master can be elected, and the cluster will fail completely. You can only set this parameter to 1, but this will not prevent brain split.

What if 2 nodes, discovery.zen.minimum_master_nodes are set to 2 and 1 respectively

An ES cluster in a production environment must have at least three nodes and set this parameter to quorum, i.e., 2. Discovery.zen. minimum_master_nodes is set to 2.

So how does this parameter avoid the split brain problem? For example, we have 3 nodes and quorum is 2. Now the network is faulty, 1 node is in one network area and the other 2 nodes are in another network area, the different network areas cannot communicate. At this time, there are two situations:

(1) If the master node is the single node and the other two nodes are the master candidate nodes, then the single master node has no specified number of candidate master nodes in the cluster. Therefore, the current master node will remove the role of the current master node and try to re-elect the node. But he could not win the election. A node in another network zone, unable to connect to the master, then initiates a re-election because there are two master candidates that satisfy quorum and thus can successfully elect a master. There will still be only one master in the cluster.

(2) If the master and another node are in a network zone, then each node is in a separate network zone. At this point, the independent node will attempt to initiate an election because it cannot connect to the master. However, since the number of nodes waiting for the master is insufficient, the master cannot be elected. In the other network area, the original master will continue to work. This also ensures that there is only one master node in the cluster. Minimum_master_nodes: 2 for elasticSearch.yml, you can configure discovery.zen.minimum_master_nodes: 2 to avoid the split brain problem.

Second, data modeling

1, case

** Example: Design a user document data type, which contains an array of address data, this design method ** ** is relatively complex, but in managing data, more flexible. **

 PUT /user_index 
 { 
 "mappings": { 
 "properties": { 
 "login_name" : { 
 "type" : "keyword" 
 }, 
 "age " : { 
 "type" : "short" 
 }, 
 "address" : { 
 "properties": { 
 "province" : { 
 "type" : "keyword" 
 }, 
 "city" : { 
 "type" : "keyword" }, 
 "street" : { 
 "type" : "keyword" 
 } 
 } 
 } 
 } 
 } 
 } 
Copy the code

However, the above data modeling has its obvious defects, that is, when conducting data search for address data, unnecessary data will often be searched. For example, in the following data environment, a user whose province is Beijing and city is Tianjin will be searched.

PUT/user_index _doc / 1 {" login_name ":" jack ", "age" : 25, "address" : [{" province ", "Beijing", "city" : "Beijing", "street" : "In maple three-way"}, {" province ", "tianjin", "city" : "tianjin" and "street" : "Chinese way"}] {} PUT/user_index / _doc / 2 "login_name" : "rose", "age" : 21, "address" : [{" province ", "hebei province", "city" : "langfang" and "street" : "yanjiao economic development zone"}, {" province ", "tianjin", "city" : "tianjin" and "street" : "Huaxia Road"}]}Copy the code

The search should look like this:

GET/user_index / _search {" query ": {" bool" : {" must ": [{" match" : {" address. Province ", "Beijing"}}, {" match ": {"address.city": "address "}}]}}Copy the code

However, the result obtained is not accurate. In this case, data modeling needs to be defined using nested Objects.

2, nested object

The problem can be solved by using a nested object as the collective type of the address array. The Document model is as follows:

PUT /user_index 
{ 
"mappings": { 
"properties": { 
"login_name" : { 
"type" : "keyword" 
}, 
"age" : { 
"type" : "short" 
}, 
"address" : { 
"type": "nested", 
"properties": { 
"province" : {
"type" : "keyword" 
}, 
"city" : { 
"type" : "keyword" 
}, 
"street" : { 
"type" : "keyword" 
} 
} 
} 
} 
} 
} 
Copy the code

The nested search syntax is then used to perform the search:

GET /user_index/_search { "query": { "bool": { "must": [ { "nested": { "path": "address", "query": { "bool": { "must": [{" match ": {" address. The province", "Beijing"}}, {" match ": {" address. City" : "tianjin"}}]]}}}}}}}Copy the code

Although the syntax becomes more complex, there is no error in reading and writing data, which is the recommended design method. The reason for this is that ordinary array data is flattened in ES as follows: (If a field needs a word, the word segmentation data is saved in the corresponding field location, of course, it should be an inverted index, here is just an intuitive case)

{" login_name ":" jack ", "address. Province:"/" Beijing ", "tianjin", "address. City" : [" Beijing ", "tianjin"] "address. Street" : [" Xisanqi East Road ", "Ancient Culture Street"]}Copy the code

The nested object data type ES will not be flattened and stored in the following way: Therefore, the desired search result must be available.

{"address.city" : "address.street" : "address.city" : "address.street" : "address.street" : "Address. City" : "Beijing ", "address. Street" : "Xisanqi East Road ",}Copy the code

3. Father-child relationship data modeling

Nested object modeling has the disadvantages that it takes a similar approach to redundant data, and the maintenance cost is high when multiple data are put together. After updating, the entire object (including nested objects and related objects) needs to be re-indexed. ES provides an implementation similar to Join in relational databases. Using the Join data type, you can separate the two objects using the Parent/Child relationship. The Parent document and the Child document are two independent documents. Updating the Parent document without re-indexing the entire Child document. Child documents are added, and changes and deletions do not affect the parent and other child documents. Key points: Father-son relationship metadata mapping, to ensure that the query time performance, but there is a limit, is the number of father and son According to must exist in the father-son relationship is a shard data exist a shard, and map the relationship of the associated metadata, then search the father-son relationship According to the time, don’t cross fragmentation, a subdivision of local have to myself, Of course the performance is high

Father and son

Steps to define a father-child relationship

  • Set the index Mapping
  • Index parent document
  • Indexed subdocument
  • Query documents as needed



Set up the Mapping

DELETE my_blogs 
#Set the Parent/Child Mapping 
PUT my_blogs
{
  "mappings": {
    "properties": {
      "blog_comments_relation": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      },
      "content": {
        "type": "text"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
} 
Copy the code

Index parent document

 PUT my_blogs/_doc/blog1 
 { 
 "title":"Learning Elasticsearch", 
 "content":"learning ELK is happy", 
 "blog_comments_relation":{ 
 "name":"blog" 
 } 
 } 

 PUT my_blogs/_doc/blog2 
 { 
 "title":"Learning Hadoop", 
 "content":"learning Hadoop", 
 "blog_comments_relation":{ 
 "name":"blog" 
 } 
 }
Copy the code

Indexed subdocument

** Parent and child documents must exist on the same shard **

Ensure query JOIN performance

When specifying a document, you must specify its parent document ID

** Uses the route parameter to ensure that the same shards are allocated **



# index subdocuments

PUT my_blogs/_doc/comment1? routing=blog1 { "comment":"I am learning ELK", "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog1" } } PUT my_blogs/_doc/comment2? routing=blog2 { "comment":"I like Hadoop!!!!!" , "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } } PUT my_blogs/_doc/comment3? routing=blog2 { "comment":"Hello Hadoop", "username":"Bob", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } }Copy the code

Queries supported by Parent/Child

  • Query all documents
  • The Parent Id query
  • Has the Child query
  • From the Parent query

1 # Query all documents

POST my_blogs/_search 
{} 

#View by parent document ID 
GET my_blogs/_doc/blog2 

#The Parent Id query 
POST my_blogs/_search 
{ 
"query": { 
"parent_id": { 
"type": "comment", 
"id": "blog2" 
} 
} 
} 

#Has Child query, returns the parent document 
POST my_blogs/_search 
{ 
"query": { 
"has_child": { 
"type": "comment", 
"query" : { 
"match": { 
"username" : "Jack" 
} 
} 
} 
} 
} 

#Has Parent query that returns related subdocuments 
POST my_blogs/_search 
{ 
"query": { 
"has_parent": { 
"parent_type": "blog", 
"query" : { 
"match": { 
"title" : "Learning Hadoop" 
}
}
}
}
}
Copy the code

Use has_child to query

Return parent document

Query through subdocuments

Returns the parent document of a specific related child document

Parent and child documents are in the same shard, so Join efficiency is high

Use has_parent to query

Returns a dependent subdocument

By querying the parent document

Returns the associated subdocuments

The parent_id command is used for query

Returns all related subdocuments

Query by document Id

Returns all related subdocuments

Accessing subdocuments

The parent document routing parameter needs to be specified

#Access subdocuments by ID 
GET my_blogs/_doc/comment2 
#Access the subdocument by ID and routingGET my_blogs/_doc/comment3? routing=blog2Copy the code

Update subdocuments

Updating child documents does not affect the parent document



Update subdocuments

PUT my_blogs/_doc/comment3? routing=blog2 { "comment": "Hello Hadoop??" , "blog_comments_relation": { "name": "comment", "parent": "blog2" } }Copy the code

Advantages: Documents are stored together, high read performance, Parent and Child documents can be updated independently ** ** Disadvantages: when updating Nested subdocuments, the entire document needs to be updated, and extra memory is required to maintain relationships. Read performance ** ** Relatively poor ** Application Scenario Subdocuments are updated occasionally, mainly in query mode, and are updated frequently

File system data modeling

Think about it. Github can use snippets of code to search for data. How is this achieved? ES is also used in Github for full-text search of data. There is an index to record code content in ES, and the general data content is as follows:

{"fileName" : "helloworld.java ", "authName" :" HXL ", "authID" : 110, "productName" : "first‐ Java ", "path" : "/com/hxl/first", "content" : "package com.hxl.first; public class HelloWorld { //code... }}"Copy the code

We can search for data through snippets of code on Github. Other conditions can also be used to achieve data search. But what if you need to use file paths to search for content? You need to define a special toggle for the field PATH. Create a Mapping

PUT /codes { "settings": { "analysis": { "analyzer": { "path_analyzer" : { "tokenizer" : "path_hierarchy" } } } }, "mappings": { "properties": { "fileName" : { "type" : "keyword" }, "authName" : { "type" : "text", "analyzer": "standard", "fields": { "keyword" : { "type" : "keyword" } } }, "authID" : { "type" : "long" }, "productName" : { "type" : "text", "analyzer": "standard", "fields": { "keyword" : { "type" : "keyword" } } }, "path" : { "type" : "text", "analyzer": "path_analyzer", "fields": { "keyword" : { "type" : "keyword" } } }, "content" : { "type" : "text", "analyzer": "standard" } } } } PUT /codes/_doc/1 { "fileName" : "Helloworld.java ", "authName" :" HXL ", "productName" : "first‐ Java ", "path" : "/com/hxl/first", "content" : "package com.hxl.first; public class HelloWorld { // some code... }" } GET /codes/_search { "query": { "match": { "path": "/com" } } } GET /codes/_analyze { "text": "/a/b/c/d", "field": "path" }Copy the code

Data manipulation

PUT /codes 
{ 
"settings": { 
"analysis": { 
"analyzer": { 
"path_analyzer" : { 
"tokenizer" : "path_hierarchy" 
} 
} 
} 
}, 
"mappings": { 
"properties": { 
"fileName" : { 
"type" : "keyword" 
}, 
"authName" : { 
"type" : "text", 
"analyzer": "standard", 
"fields": { 
"keyword" : { 
"type" : "keyword" 
} 
} 
}, 
"authID" : { 
"type" : "long" 
}, 
"productName" : { 
"type" : "text", 
"analyzer": "standard", 
"fields": { 
"keyword" : { 
"type" : "keyword" 
} 
} 
}, 
"path" : { 
"type" : "text", 
"analyzer": "path_analyzer", 
"fields": { 
"keyword" : {
"type" : "text", 
"analyzer": "standard" 
} 
} 
}, 
"content" : { 
"type" : "text", 
"analyzer": "standard" 
} 
} 
} 
} 

GET /codes/_search 
{ 
"query": { 
"match": { 
"path.keyword": "/com" 
} 
} 
} 

GET /codes/_search 
{ 
"query": { 
"bool": { 
"should": [ 
{ 
"match": { 
"path": "/com" 
} 
}, 
{ 
"match": { 
"path.keyword": "/com/hxl" 
} 
} 
] 
} 
} 
} 
Copy the code

Reference: www.elastic.co/guide/en/el… pathhierarchy-tokenizer.html

Five, according to the keyword paging search

When there is a lot of data, we usually need to do paging queries. For example, we specify a page number and specify how many pieces of data to display per page, and then Elasticsearch returns the data for that page number.

1. Use from and size for paging

When executing a query, you can specify from (the number of data to start with) and size (the number of data to return per page) to complete paging easily. L from = (page – 1) * size

POST/es_db / _doc / _search {" from ": 0," size ": 2," query ": {" match" : {" address ":" guangzhou tianhe "}}}Copy the code

2. Use Scroll mode for paging

The previous use of from and size methods, the query within 1W-5W data is OK, but if there is a large number of data, there will be performance problems. Elasticsearch has a limit that will not allow you to query data after 10000. If you want to query data after 1W items, use the scroll cursor provided by Elasticsearch to query data. With a large number of pages, each page requires the data to be queried to be reordered, which can be a waste of performance. To use Scroll is to sort the data that you’re going to use once and then take it out in batches. Performance is much better than from + size. After the scroll query is used, sorted data will remain for a certain period of time, and subsequent paging queries can obtain data from this snapshot.

2.1. Use scroll paging query for the first time

Here, we’re holding the sorted data for 1 minute, so set scroll to 1m

GET /es_db/_search? Scroll = 1 m {" query ": {" multi_match" : {" query ":" guangzhou changsha zhang SAN ", "fields" : [" address ", "name"]}}, "size" : 100}Copy the code

After execution, we notice that there is an item in the response result: “_scroll_id”: “FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFldqc1c1Y3Y3Uzdpb3FaYTFMT094RFEAAAAAAABLkBZnY1dPTFI5SlFET1BlOUNDQ0RyZi1 B” (note that this requires your own _scroll_id.) Later, we need to perform a query based on this _scroll_id

2.2 For the second time, scroll ID is directly used for query

GET _search/scroll? scroll=1m { "scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFldqc1c1Y3Y3Uzdpb3FaYTFMT094RFEAAAAAAABLkBZnY1dPTFI5SlFET1B lOUNDQ0RyZi1B" }Copy the code

Six, Elasticsearch SQL



Elasticsearch SQL allows you to execute SQL-like queries using REST interfaces, command lines, or JDBC

To use SQL to carry out data retrieval and data aggregation.

Elasticsearch SQL

Local integration

Elasticsearch SQL is built specifically for Elasticsearch. Each SQL query is phased against the underlying storage

The node is executed effectively.

No extra requirements

Elasticsearch SQL can be run directly on the server without relying on other hardware, processes, or runtime libraries

Elasticsearch cluster

Lightweight and Efficient To perform queries as succinctly and efficiently as SQL

SQL > Elasticsearch

**SQL ** **Elasticsearch **
The column (column) Field = field
The row (line) Document
The table (table). Index = index
Schema Mapping
Database server Elasticsearch Cluster instance

2, Elasticsearch SQL syntax

SELECT select_expr [, ...]  [ FROM table_name ] [ WHERE condition ] [ GROUP BY grouping_element [, ...] ] [ HAVING condition] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count ] ] [ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]Copy the code

Currently, FROM supports only single tables

3. Job inquiry cases

3.1. Query a piece of data in the job index database

Format: Specifies the data type to be returned. //1. format=txt { "query":"SELECT * FROM es_db limit 1" }
#Return the dataaddress | age | name | remark | sex ---------------+---------------+---------------+---------------+--------------- Guangzhou tianhe park | | | 25 zhang SAN Java developer | 1Copy the code

Elasticsearch SQL supports the following types:

* * * * format Description * * * *
csv Comma separator
json JSON format TSV TAB delimiter
txt Class cli said
yaml YAML human readable format

3.2. Convert SQL to DSL

GET /_sql/translate 
{ 
"query":"SELECT * FROM es_db limit 1" 
} 
Copy the code

The results are as follows:

{ 
"size" : 1, 
"_source" : { 
"includes" : [ 
"age", 
"remark", 
"sex" 
], 
"excludes" : [ ] 
}, 
"docvalue_fields" : [ 
{ 
"field" : "address" 
}, 
{ 
"field" : "book" 
}, 
{ 
"field" : "name" 
} 
], 
"sort" : [ 
{ 
"_doc" : { 
"order" : "asc" 
} 
} 
] 
}
Copy the code

3.4. Full text search of positions

3.4.1 track, requirements,

Retrieve users whose address contains Guangzhou and whose name contains Zhang SAN.

3.4.2 MATCH function

The MATCH function is used when performing full-text retrieval.

 MATCH( 
 field_exp, 
 constant_exp 
 [, options]) 
Copy the code

Field_exp: matching field constant_exp: matching constant expression

Rule 3.4.3, implementation,

GET /_sql? Format = TXT {"query":"select * from es_db where MATCH(address, 'gZ ') or MATCH(name, SQL > select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; format=txt { "query":"select age, count(*) as age_cnt from es_db group by age" }Copy the code

This way is more intuitive and concise. Currently, there are some restrictions on Elasticsearch SQL. For example, JOIN is not supported and complex subqueries are not supported. Therefore, some relatively complex functions have to be implemented by MEANS of DSL.

Java API operation ES

Related dependencies:

<dependencies><! ‐ ES high order client API ‐‐><dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>Elasticsearch ‐ rest ‐ high ‐ level ‐ client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>Log4j ‐ core</artifactId>
<version>2.11.1</version>
</dependency><! ‐‐ a library for converting Java objects to JSON and JSON to Java objects by Alibaba<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.14.3</version>
<scope>test</scope>
</dependency>

</dependencies>
Copy the code

Use Java APIS to operate ES clusters

** The initial connection to ** ** was made using RestHighLevelClient to connect to the ES cluster

public JobFullTextServiceImpl(a) {
  // Establish a connection with ES
  // 1. Use RestHighLevelClient to build a client connection.
  // 2. Build RestClientBuilder based on the restClient. builder method
  // 3. Use HttpHost to add the ES node
  /* RestClientBuilder = restClient. builder(new HttpHost("192.168.21.130", 9200, "HTTP "), New HttpHost("192.168.21.131", 9200, "HTTP "), new HttpHost("192.168.21.132", 9200," HTTP ")); * /
  RestClientBuilder restClientBuilder = RestClient.builder(
    new HttpHost("127.0.0.1".9200."http"));
  restHighLevelClient = new RestHighLevelClient(restClientBuilder);
}
Copy the code

Add job data to ES

Using IndexRequest objects to describe requests allows you to set the parameters of the request: setting the ID, and setting the data to be transferred from ES – note that since ES uses JSON (DSL) to manipulate data, you need to use a FastJSON library to convert objects to JSON strings for manipulation

@Override
public void add(JobDetail jobDetail) throws IOException {
    //1. Construct an IndexRequest object to describe the data from the ES request.
    IndexRequest indexRequest = new IndexRequest(JOB_IDX);

    //2. Set the document ID.
    indexRequest.id(jobDetail.getId() + "");

    //3. Use FastJSON to convert entity-class objects to JSON.
    String json = JSONObject.toJSONString(jobDetail);

    //4. Use the indexRequest. source method to set the document data and set the requested data to JSON format.
    indexRequest.source(json, XContentType.JSON);

    //5. Use ES High level client to call index to add a document to index.
    restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
}

Copy the code

Query/delete/search/paging

Add/delete/modify

@Override
public void add(JobDetail jobDetail) throws IOException {
    //1. Construct an IndexRequest object to describe the data from the ES request.
    IndexRequest indexRequest = new IndexRequest(JOB_IDX);

    //2. Set the document ID.
    indexRequest.id(jobDetail.getId() + "");

    //3. Use FastJSON to convert entity-class objects to JSON.
    String json = JSONObject.toJSONString(jobDetail);

    //4. Use the indexRequest. source method to set the document data and set the requested data to JSON format.
    indexRequest.source(json, XContentType.JSON);

    //5. Use ES High level client to call index to add a document to index.
    restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
}

@Override
public JobDetail findById(long id) throws IOException {
    // 1. Build the GetRequest request.
    GetRequest getRequest = new GetRequest(JOB_IDX, id + "");

    // 2. Use resthighLevelClient. get to send a GetRequest request and obtain a response from the ES server.
    GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);

    // 3. Convert the ES response data to a JSON string
    String json = getResponse.getSourceAsString();

    // 4. Use FastJSON to convert the JSON string into a JobDetail class object
    JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);

    // 5. Remember: Set the ID separately
    jobDetail.setId(id);

    return jobDetail;
}

@Override
public void update(JobDetail jobDetail) throws IOException {
    // 1. Check whether the document with the corresponding ID exists
    // a) Build GetRequest
    GetRequest getRequest = new GetRequest(JOB_IDX, jobDetail.getId() + "");

    // b) Run the exists method of the client to initiate a request and check whether the request exists
    boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);

    if(exists) {
        // 2. Build the UpdateRequest request
        UpdateRequest updateRequest = new UpdateRequest(JOB_IDX, jobDetail.getId() + "");

        // 3. Set the UpdateRequest document to JSON format
        updateRequest.doc(JSONObject.toJSONString(jobDetail), XContentType.JSON);

        // 4. Run the client command to initiate an update requestrestHighLevelClient.update(updateRequest, RequestOptions.DEFAULT); }}@Override
public void deleteById(long id) throws IOException {
    // 1. Build the DELETE request
    DeleteRequest deleteRequest = new DeleteRequest(JOB_IDX, id + "");

    // 2. Run RestHighLevelClient to execute the delete request
    restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);

}
Copy the code

The full text retrieval

@Override
public List<JobDetail> searchByKeywords(String keywords) throws IOException {
    // 1. Build SearchRequest
    // API for full text search and keyword search
    SearchRequest searchRequest = new SearchRequest(JOB_IDX);

    // 2. Create a SearchSourceBuilder specifically for building query criteria
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    / / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
    MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");

    // Set the query criteria to the query request builder
    searchSourceBuilder.query(multiMatchQueryBuilder);

    // 4. Call searchrequest. source to set the query criteria to the SearchRequest
    searchRequest.source(searchSourceBuilder);

    / / 5. Perform RestHighLevelClient. Search by request
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hitArray = searchResponse.getHits().getHits();

    // 6. Iterate over the result
    ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();

    for (SearchHit documentFields : hitArray) {
        // 1) Get the result of the hit
        String json = documentFields.getSourceAsString();

        // 2) Convert the JSON string to an object
        JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);

        // 3) Use searchhit. getId to set the document ID
        jobDetail.setId(Long.parseLong(documentFields.getId()));

        jobDetailArrayList.add(jobDetail);
    }

    return jobDetailArrayList;
}

Copy the code

Paging query

@Override
public Map<String, Object> searchByPage(String keywords, int pageNum, int pageSize) throws IOException {
    // 1. Build SearchRequest
    // API for full text search and keyword search
    SearchRequest searchRequest = new SearchRequest(JOB_IDX);

    // 2. Create a SearchSourceBuilder specifically for building query criteria
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

    / / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
    MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");

    // Set the query criteria to the query request builder
    searchSourceBuilder.query(multiMatchQueryBuilder);

    // How many pages to display per page
    searchSourceBuilder.size(pageSize);
    // set the number from which to start the query
    searchSourceBuilder.from((pageNum - 1) * pageSize);

    // 4. Call searchrequest. source to set the query criteria to the SearchRequest
    searchRequest.source(searchSourceBuilder);

    / / 5. Perform RestHighLevelClient. Search by request
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    SearchHit[] hitArray = searchResponse.getHits().getHits();

    // 6. Iterate over the result
    ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();

    for (SearchHit documentFields : hitArray) {
        // 1) Get the result of the hit
        String json = documentFields.getSourceAsString();

        // 2) Convert the JSON string to an object
        JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);

        // 3) Use searchhit. getId to set the document ID
        jobDetail.setId(Long.parseLong(documentFields.getId()));

        jobDetailArrayList.add(jobDetail);
    }

    // 8. Encapsulate the results into a Map structure (with paging information)
    // a) total -> use searchhits.getTotalhits ().value to get all records
    // b) content -> Data in the current page
    long totalNum = searchResponse.getHits().getTotalHits().value;
    HashMap hashMap = new HashMap();
    hashMap.put("total", totalNum);
    hashMap.put("content", jobDetailArrayList);


    return hashMap;
}
Copy the code



Use scroll paging to query (deep paging)

  1. The first query does not contain scroll_id, so you need to set the scroll timeout period
  2. Do not set the timeout period too short; otherwise, exceptions may occur
  3. Second query, SearchSrollRequest
@Override
public Map<String, Object> searchByScrollPage(String keywords, String scrollId, int pageSize) throws IOException {
    SearchResponse searchResponse = null;

    if(scrollId == null) {
        // 1. Build SearchRequest
        // API for full text search and keyword search
        SearchRequest searchRequest = new SearchRequest(JOB_IDX);

        // 2. Create a SearchSourceBuilder specifically for building query criteria
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        / / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
        MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");

        // Set the query criteria to the query request builder
        searchSourceBuilder.query(multiMatchQueryBuilder);

        // Set the highlight
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.field("jd");
        highlightBuilder.preTags("<font color='red'>");
        highlightBuilder.postTags("</font>");

        // Set the request to highlight
        searchSourceBuilder.highlighter(highlightBuilder);

        // How many pages to display per page
        searchSourceBuilder.size(pageSize);

        // 4. Call searchrequest. source to set the query criteria to the SearchRequest
        searchRequest.source(searchSourceBuilder);

        //--------------------------
        // Set scroll query
        //--------------------------
        searchRequest.scroll(TimeValue.timeValueMinutes(5));

        / / 5. Perform RestHighLevelClient. Search by request
        searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

    }
    // The scroll ID will be used for the second query
    else {
        SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
        searchScrollRequest.scroll(TimeValue.timeValueMinutes(5));

        // Use RestHighLevelClient to send a Scroll request
        searchResponse = restHighLevelClient.scroll(searchScrollRequest, RequestOptions.DEFAULT);
    }

    //--------------------------
    // Iterate over the ES response data
    //--------------------------

    SearchHit[] hitArray = searchResponse.getHits().getHits();

    // 6. Iterate over the result
    ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();

    for (SearchHit documentFields : hitArray) {
        // 1) Get the result of the hit
        String json = documentFields.getSourceAsString();

        // 2) Convert the JSON string to an object
        JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);

        // 3) Use searchhit. getId to set the document ID
        jobDetail.setId(Long.parseLong(documentFields.getId()));

        jobDetailArrayList.add(jobDetail);

        // Set some highlighted text to the entity class
        // Encapsulates highlights
        Map<String, HighlightField> highlightFieldMap = documentFields.getHighlightFields();
        HighlightField titleHL = highlightFieldMap.get("title");
        HighlightField jdHL = highlightFieldMap.get("jd");

        if(titleHL ! =null) {
            // Gets the highlighted fragment of the specified field
            Text[] fragments = titleHL.getFragments();
            // Concatenate the highlighted fragments into a full highlighted field
            StringBuilder builder = new StringBuilder();
            for(Text text : fragments) {
                builder.append(text);
            }
            // Set it to the entity class
            jobDetail.setTitle(builder.toString());
        }

        if(jdHL ! =null) {
            // Gets the highlighted fragment of the specified field
            Text[] fragments = jdHL.getFragments();
            // Concatenate the highlighted fragments into a full highlighted field
            StringBuilder builder = new StringBuilder();
            for(Text text : fragments) {
                builder.append(text);
            }
            // Set it to the entity classjobDetail.setJd(builder.toString()); }}// 8. Encapsulate the results into a Map structure (with paging information)
    // a) total -> use searchhits.getTotalhits ().value to get all records
    // b) content -> Data in the current page
    long totalNum = searchResponse.getHits().getTotalHits().value;
    HashMap hashMap = new HashMap();
    hashMap.put("scroll_id", searchResponse.getScrollId());
    hashMap.put("content", jobDetailArrayList);

    return hashMap;
}

Copy the code

Highlighting the query

  1. Configure the highlighting option
// Set the highlight
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.field("jd");
highlightBuilder.preTags("<font color='red'>");
highlightBuilder.postTags("</font>");
Copy the code
  1. The highlighted fields need to be spliced together and set into the entity class
// 1) Get the result of the hit
String json = documentFields.getSourceAsString();

// 2) Convert the JSON string to an object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);

// 3) Use searchhit. getId to set the document ID
jobDetail.setId(Long.parseLong(documentFields.getId()));

jobDetailArrayList.add(jobDetail);

// Set some highlighted text to the entity class
// Encapsulates highlights
Map<String, HighlightField> highlightFieldMap = documentFields.getHighlightFields();
HighlightField titleHL = highlightFieldMap.get("title");
HighlightField jdHL = highlightFieldMap.get("jd");

if(titleHL ! =null) {
    // Gets the highlighted fragment of the specified field
    Text[] fragments = titleHL.getFragments();
    // Concatenate the highlighted fragments into a full highlighted field
    StringBuilder builder = new StringBuilder();
    for(Text text : fragments) {
        builder.append(text);
    }
    // Set it to the entity class
    jobDetail.setTitle(builder.toString());
}

if(jdHL ! =null) {
    // Gets the highlighted fragment of the specified field
    Text[] fragments = jdHL.getFragments();
    // Concatenate the highlighted fragments into a full highlighted field
    StringBuilder builder = new StringBuilder();
    for(Text text : fragments) {
        builder.append(text);
    }
    // Set it to the entity class
    jobDetail.setJd(builder.toString());
}
Copy the code