“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
Basic principle of ElasticSearch document score calculation
1) Boolean model
Based on the user’s query criteria, the doc containing the specified term is filtered first
Query "hello world" ‐‐> hello/world/hello & world bool ‐‐> must/must not/should ‐> filter ‐‐> include/not include/may include doc ‐‐> ‐‐> To improve performance by reducing the number of doc to be computed laterCopy the code
2) Relevance Score algorithm: Simply put, relevance score algorithm is used to calculate the degree of relevance between the text in an index and the search text
Elasticsearch uses term Frequency/Inverse Document Frequency algorithm (TF/IDF algorithm term Frequency) : How many times each term in the search text appears in the field text, and the more times it appears, the more relevant it is
Hello world doc1: Hello you, and world is very good doc2: Hello, how are youCopy the code
Inverse Document frequency: How many times each term in the search text appears in all documents in the entire index, the more times it occurs, the less relevant it is
Hello world doc1: Hello, Tuling is very good6 doc2: Hi world, how are youCopy the code
For example, if you have 10,000 documents in index, the word hello appears 1,000 times in all of the documents; The word “world” appears 100 times in all of the documents: field-length norm: the longer the Field, the less relevant the search request: Hello world
Doc1: {" title ":" hello, article "and" content ":"... N a word "} doc2: {" title ":" my article ", "content" : "... N words, hi world"}Copy the code
Hello world occurs the same number of times throughout the index doc1 is more relevant, title field is shorter **2, How is _score calculated on a document **
GET /es_db/_doc/1/_explain
{
"query": {
"match": {
"remark": "java developer"
}
}
}
Copy the code
Hello world –> es = hello world –> es = hello world –> es = hello world –> es = Hello world –> es = Hello world –> es Query vector Hello term is given a score of 2 based on all doc and world term is given a score of 5 based on all doc [2, 5]
query vector
Doc vector, three doc’s, one containing one term, one containing another term, and one containing two terms
Three doc
Doc2: contains hello –> [2, 0] doc2: contains world –> [0, 5] doc3: contains hello, world –> [2, 5]
It will calculate a score for each term for each doc, a score for hello, a score for world, and all of them
The fractions of term form a DOC vector
Draw in a graph, take the radians of each doc vector to query vector, and give the total fraction of each doc to multiple terms
Each doc Vector calculates the radian of the Query vector, and finally gives the total score of a DOC relative to multiple terms in query based on this radian
The greater the radian, the fraction end; The smaller the radian, the higher the score
If it is multiple terms, it is calculated by linear algebra and cannot be represented by graphs
Key parameters of es production cluster deployment specifically tailored to the split brain problem of production cluster
What is ** cluster split brain? ** The so-called split brain problem is that different nodes in the same cluster have different understandings of the state of the cluster. For example, there are two masters in the cluster. If a cluster is divided into two pieces due to network failure, each piece has multiple nodes and one master. There are two masters in the cluster.
But because the master is a very important player in the cluster, controlling the maintenance of the cluster state and the allocation of shards,
Therefore, if there are two masters, it may cause data corruption.
Such as:
Node 1 is elected master at startup and saves the master shard mark as 0P, while node 2 saves the replication shard mark as
0R Now, what happens if communication between two nodes breaks? Because of a network problem or just because of it
It is possible that a node in the.
Each node believes the other is dead. Node 1 does not need to do anything because it is already elected as the primary node. but
Node 2 automatically elects itself as master because it believes that part of the cluster has no master.
In the ElasticSearch cluster, the master node is responsible for distributing the fragments evenly across the nodes. Node 2 holds
Copies shard, but it believes the master node is unavailable. So it automatically promotes the replication node as the master node.
Now our cluster is in an inconsistent state. An index request on node 1 assigns index data to the primary node, while a request on node 2 assigns index data to the shard. In this case, the two pieces of shard data are separated and it is difficult to reorder them without a full reindex. In the worse case of an index client that is not aware of the cluster (for example, using the REST interface), the problem is so transparent that the index request will still complete successfully every time no matter which node is hit. The problem becomes apparent only when the data is searched: the results vary depending on which node the search request hits.
So what that parameter does is tell ES not to vote for a master until there are enough candidate master nodes, or not to vote for a master. This parameter must be set to the quorum number of master candidates in the cluster, that is, the majority. For quorum, the number of master candidates / 2 + 1.
For example, if we have 10 nodes, all capable of maintaining data, and all master candidates, quorum is 10/2 + 1 = 6.
If we have three master candidates and 100 data nodes, quorum is 3/2 + 1 = 2
If we have two nodes, both of which can be master candidates, then quorum is 2/2 + 1 = 2. This is a problem, because if a node fails, there will be only one master candidate left, which will not be able to satisfy quorum, and no new master can be elected, and the cluster will fail completely. You can only set this parameter to 1, but this will not prevent brain split.
What if 2 nodes, discovery.zen.minimum_master_nodes are set to 2 and 1 respectively
An ES cluster in a production environment must have at least three nodes and set this parameter to quorum, i.e., 2. Discovery.zen. minimum_master_nodes is set to 2.
So how does this parameter avoid the split brain problem? For example, we have 3 nodes and quorum is 2. Now the network is faulty, 1 node is in one network area and the other 2 nodes are in another network area, the different network areas cannot communicate. At this time, there are two situations:
(1) If the master node is the single node and the other two nodes are the master candidate nodes, then the single master node has no specified number of candidate master nodes in the cluster. Therefore, the current master node will remove the role of the current master node and try to re-elect the node. But he could not win the election. A node in another network zone, unable to connect to the master, then initiates a re-election because there are two master candidates that satisfy quorum and thus can successfully elect a master. There will still be only one master in the cluster.
(2) If the master and another node are in a network zone, then each node is in a separate network zone. At this point, the independent node will attempt to initiate an election because it cannot connect to the master. However, since the number of nodes waiting for the master is insufficient, the master cannot be elected. In the other network area, the original master will continue to work. This also ensures that there is only one master node in the cluster. Minimum_master_nodes: 2 for elasticSearch.yml, you can configure discovery.zen.minimum_master_nodes: 2 to avoid the split brain problem.
Second, data modeling
1, case
** Example: Design a user document data type, which contains an array of address data, this design method ** ** is relatively complex, but in managing data, more flexible. **
PUT /user_index
{
"mappings": {
"properties": {
"login_name" : {
"type" : "keyword"
},
"age " : {
"type" : "short"
},
"address" : {
"properties": {
"province" : {
"type" : "keyword"
},
"city" : {
"type" : "keyword" },
"street" : {
"type" : "keyword"
}
}
}
}
}
}
Copy the code
However, the above data modeling has its obvious defects, that is, when conducting data search for address data, unnecessary data will often be searched. For example, in the following data environment, a user whose province is Beijing and city is Tianjin will be searched.
PUT/user_index _doc / 1 {" login_name ":" jack ", "age" : 25, "address" : [{" province ", "Beijing", "city" : "Beijing", "street" : "In maple three-way"}, {" province ", "tianjin", "city" : "tianjin" and "street" : "Chinese way"}] {} PUT/user_index / _doc / 2 "login_name" : "rose", "age" : 21, "address" : [{" province ", "hebei province", "city" : "langfang" and "street" : "yanjiao economic development zone"}, {" province ", "tianjin", "city" : "tianjin" and "street" : "Huaxia Road"}]}Copy the code
The search should look like this:
GET/user_index / _search {" query ": {" bool" : {" must ": [{" match" : {" address. Province ", "Beijing"}}, {" match ": {"address.city": "address "}}]}}Copy the code
However, the result obtained is not accurate. In this case, data modeling needs to be defined using nested Objects.
2, nested object
The problem can be solved by using a nested object as the collective type of the address array. The Document model is as follows:
PUT /user_index
{
"mappings": {
"properties": {
"login_name" : {
"type" : "keyword"
},
"age" : {
"type" : "short"
},
"address" : {
"type": "nested",
"properties": {
"province" : {
"type" : "keyword"
},
"city" : {
"type" : "keyword"
},
"street" : {
"type" : "keyword"
}
}
}
}
}
}
Copy the code
The nested search syntax is then used to perform the search:
GET /user_index/_search { "query": { "bool": { "must": [ { "nested": { "path": "address", "query": { "bool": { "must": [{" match ": {" address. The province", "Beijing"}}, {" match ": {" address. City" : "tianjin"}}]]}}}}}}}Copy the code
Although the syntax becomes more complex, there is no error in reading and writing data, which is the recommended design method. The reason for this is that ordinary array data is flattened in ES as follows: (If a field needs a word, the word segmentation data is saved in the corresponding field location, of course, it should be an inverted index, here is just an intuitive case)
{" login_name ":" jack ", "address. Province:"/" Beijing ", "tianjin", "address. City" : [" Beijing ", "tianjin"] "address. Street" : [" Xisanqi East Road ", "Ancient Culture Street"]}Copy the code
The nested object data type ES will not be flattened and stored in the following way: Therefore, the desired search result must be available.
{"address.city" : "address.street" : "address.city" : "address.street" : "address.street" : "Address. City" : "Beijing ", "address. Street" : "Xisanqi East Road ",}Copy the code
3. Father-child relationship data modeling
Nested object modeling has the disadvantages that it takes a similar approach to redundant data, and the maintenance cost is high when multiple data are put together. After updating, the entire object (including nested objects and related objects) needs to be re-indexed. ES provides an implementation similar to Join in relational databases. Using the Join data type, you can separate the two objects using the Parent/Child relationship. The Parent document and the Child document are two independent documents. Updating the Parent document without re-indexing the entire Child document. Child documents are added, and changes and deletions do not affect the parent and other child documents. Key points: Father-son relationship metadata mapping, to ensure that the query time performance, but there is a limit, is the number of father and son According to must exist in the father-son relationship is a shard data exist a shard, and map the relationship of the associated metadata, then search the father-son relationship According to the time, don’t cross fragmentation, a subdivision of local have to myself, Of course the performance is high
Father and son
Steps to define a father-child relationship
- Set the index Mapping
- Index parent document
- Indexed subdocument
- Query documents as needed
Set up the Mapping
DELETE my_blogs
#Set the Parent/Child Mapping
PUT my_blogs
{
"mappings": {
"properties": {
"blog_comments_relation": {
"type": "join",
"relations": {
"blog": "comment"
}
},
"content": {
"type": "text"
},
"title": {
"type": "keyword"
}
}
}
}
Copy the code
Index parent document
PUT my_blogs/_doc/blog1
{
"title":"Learning Elasticsearch",
"content":"learning ELK is happy",
"blog_comments_relation":{
"name":"blog"
}
}
PUT my_blogs/_doc/blog2
{
"title":"Learning Hadoop",
"content":"learning Hadoop",
"blog_comments_relation":{
"name":"blog"
}
}
Copy the code
Indexed subdocument
** Parent and child documents must exist on the same shard **
Ensure query JOIN performance
When specifying a document, you must specify its parent document ID
** Uses the route parameter to ensure that the same shards are allocated **
# index subdocuments
PUT my_blogs/_doc/comment1? routing=blog1 { "comment":"I am learning ELK", "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog1" } } PUT my_blogs/_doc/comment2? routing=blog2 { "comment":"I like Hadoop!!!!!" , "username":"Jack", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } } PUT my_blogs/_doc/comment3? routing=blog2 { "comment":"Hello Hadoop", "username":"Bob", "blog_comments_relation":{ "name":"comment", "parent":"blog2" } }Copy the code
Queries supported by Parent/Child
- Query all documents
- The Parent Id query
- Has the Child query
- From the Parent query
1 # Query all documents
POST my_blogs/_search
{}
#View by parent document ID
GET my_blogs/_doc/blog2
#The Parent Id query
POST my_blogs/_search
{
"query": {
"parent_id": {
"type": "comment",
"id": "blog2"
}
}
}
#Has Child query, returns the parent document
POST my_blogs/_search
{
"query": {
"has_child": {
"type": "comment",
"query" : {
"match": {
"username" : "Jack"
}
}
}
}
}
#Has Parent query that returns related subdocuments
POST my_blogs/_search
{
"query": {
"has_parent": {
"parent_type": "blog",
"query" : {
"match": {
"title" : "Learning Hadoop"
}
}
}
}
}
Copy the code
Use has_child to query
Return parent document
Query through subdocuments
Returns the parent document of a specific related child document
Parent and child documents are in the same shard, so Join efficiency is high
Use has_parent to query
Returns a dependent subdocument
By querying the parent document
Returns the associated subdocuments
The parent_id command is used for query
Returns all related subdocuments
Query by document Id
Returns all related subdocuments
Accessing subdocuments
The parent document routing parameter needs to be specified
#Access subdocuments by ID
GET my_blogs/_doc/comment2
#Access the subdocument by ID and routingGET my_blogs/_doc/comment3? routing=blog2Copy the code
Update subdocuments
Updating child documents does not affect the parent document
Update subdocuments
PUT my_blogs/_doc/comment3? routing=blog2 { "comment": "Hello Hadoop??" , "blog_comments_relation": { "name": "comment", "parent": "blog2" } }Copy the code
Advantages: Documents are stored together, high read performance, Parent and Child documents can be updated independently ** ** Disadvantages: when updating Nested subdocuments, the entire document needs to be updated, and extra memory is required to maintain relationships. Read performance ** ** Relatively poor ** Application Scenario Subdocuments are updated occasionally, mainly in query mode, and are updated frequently
File system data modeling
Think about it. Github can use snippets of code to search for data. How is this achieved? ES is also used in Github for full-text search of data. There is an index to record code content in ES, and the general data content is as follows:
{"fileName" : "helloworld.java ", "authName" :" HXL ", "authID" : 110, "productName" : "first‐ Java ", "path" : "/com/hxl/first", "content" : "package com.hxl.first; public class HelloWorld { //code... }}"Copy the code
We can search for data through snippets of code on Github. Other conditions can also be used to achieve data search. But what if you need to use file paths to search for content? You need to define a special toggle for the field PATH. Create a Mapping
PUT /codes { "settings": { "analysis": { "analyzer": { "path_analyzer" : { "tokenizer" : "path_hierarchy" } } } }, "mappings": { "properties": { "fileName" : { "type" : "keyword" }, "authName" : { "type" : "text", "analyzer": "standard", "fields": { "keyword" : { "type" : "keyword" } } }, "authID" : { "type" : "long" }, "productName" : { "type" : "text", "analyzer": "standard", "fields": { "keyword" : { "type" : "keyword" } } }, "path" : { "type" : "text", "analyzer": "path_analyzer", "fields": { "keyword" : { "type" : "keyword" } } }, "content" : { "type" : "text", "analyzer": "standard" } } } } PUT /codes/_doc/1 { "fileName" : "Helloworld.java ", "authName" :" HXL ", "productName" : "first‐ Java ", "path" : "/com/hxl/first", "content" : "package com.hxl.first; public class HelloWorld { // some code... }" } GET /codes/_search { "query": { "match": { "path": "/com" } } } GET /codes/_analyze { "text": "/a/b/c/d", "field": "path" }Copy the code
Data manipulation
PUT /codes
{
"settings": {
"analysis": {
"analyzer": {
"path_analyzer" : {
"tokenizer" : "path_hierarchy"
}
}
}
},
"mappings": {
"properties": {
"fileName" : {
"type" : "keyword"
},
"authName" : {
"type" : "text",
"analyzer": "standard",
"fields": {
"keyword" : {
"type" : "keyword"
}
}
},
"authID" : {
"type" : "long"
},
"productName" : {
"type" : "text",
"analyzer": "standard",
"fields": {
"keyword" : {
"type" : "keyword"
}
}
},
"path" : {
"type" : "text",
"analyzer": "path_analyzer",
"fields": {
"keyword" : {
"type" : "text",
"analyzer": "standard"
}
}
},
"content" : {
"type" : "text",
"analyzer": "standard"
}
}
}
}
GET /codes/_search
{
"query": {
"match": {
"path.keyword": "/com"
}
}
}
GET /codes/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"path": "/com"
}
},
{
"match": {
"path.keyword": "/com/hxl"
}
}
]
}
}
}
Copy the code
Reference: www.elastic.co/guide/en/el… pathhierarchy-tokenizer.html
Five, according to the keyword paging search
When there is a lot of data, we usually need to do paging queries. For example, we specify a page number and specify how many pieces of data to display per page, and then Elasticsearch returns the data for that page number.
1. Use from and size for paging
When executing a query, you can specify from (the number of data to start with) and size (the number of data to return per page) to complete paging easily. L from = (page – 1) * size
POST/es_db / _doc / _search {" from ": 0," size ": 2," query ": {" match" : {" address ":" guangzhou tianhe "}}}Copy the code
2. Use Scroll mode for paging
The previous use of from and size methods, the query within 1W-5W data is OK, but if there is a large number of data, there will be performance problems. Elasticsearch has a limit that will not allow you to query data after 10000. If you want to query data after 1W items, use the scroll cursor provided by Elasticsearch to query data. With a large number of pages, each page requires the data to be queried to be reordered, which can be a waste of performance. To use Scroll is to sort the data that you’re going to use once and then take it out in batches. Performance is much better than from + size. After the scroll query is used, sorted data will remain for a certain period of time, and subsequent paging queries can obtain data from this snapshot.
2.1. Use scroll paging query for the first time
Here, we’re holding the sorted data for 1 minute, so set scroll to 1m
GET /es_db/_search? Scroll = 1 m {" query ": {" multi_match" : {" query ":" guangzhou changsha zhang SAN ", "fields" : [" address ", "name"]}}, "size" : 100}Copy the code
After execution, we notice that there is an item in the response result: “_scroll_id”: “FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFldqc1c1Y3Y3Uzdpb3FaYTFMT094RFEAAAAAAABLkBZnY1dPTFI5SlFET1BlOUNDQ0RyZi1 B” (note that this requires your own _scroll_id.) Later, we need to perform a query based on this _scroll_id
2.2 For the second time, scroll ID is directly used for query
GET _search/scroll? scroll=1m { "scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFldqc1c1Y3Y3Uzdpb3FaYTFMT094RFEAAAAAAABLkBZnY1dPTFI5SlFET1B lOUNDQ0RyZi1B" }Copy the code
Six, Elasticsearch SQL
Elasticsearch SQL allows you to execute SQL-like queries using REST interfaces, command lines, or JDBC
To use SQL to carry out data retrieval and data aggregation.
Elasticsearch SQL
Local integration
Elasticsearch SQL is built specifically for Elasticsearch. Each SQL query is phased against the underlying storage
The node is executed effectively.
No extra requirements
Elasticsearch SQL can be run directly on the server without relying on other hardware, processes, or runtime libraries
Elasticsearch cluster
Lightweight and Efficient To perform queries as succinctly and efficiently as SQL
SQL > Elasticsearch
**SQL ** | **Elasticsearch ** |
---|---|
The column (column) | Field = field |
The row (line) | Document |
The table (table). | Index = index |
Schema | Mapping |
Database server | Elasticsearch Cluster instance |
2, Elasticsearch SQL syntax
SELECT select_expr [, ...] [ FROM table_name ] [ WHERE condition ] [ GROUP BY grouping_element [, ...] ] [ HAVING condition] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count ] ] [ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]Copy the code
Currently, FROM supports only single tables
3. Job inquiry cases
3.1. Query a piece of data in the job index database
Format: Specifies the data type to be returned. //1. format=txt { "query":"SELECT * FROM es_db limit 1" }
#Return the dataaddress | age | name | remark | sex ---------------+---------------+---------------+---------------+--------------- Guangzhou tianhe park | | | 25 zhang SAN Java developer | 1Copy the code
Elasticsearch SQL supports the following types:
* * * * format | Description * * * * |
---|---|
csv | Comma separator |
json | JSON format TSV TAB delimiter |
txt | Class cli said |
yaml | YAML human readable format |
3.2. Convert SQL to DSL
GET /_sql/translate
{
"query":"SELECT * FROM es_db limit 1"
}
Copy the code
The results are as follows:
{
"size" : 1,
"_source" : {
"includes" : [
"age",
"remark",
"sex"
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "address"
},
{
"field" : "book"
},
{
"field" : "name"
}
],
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}
Copy the code
3.4. Full text search of positions
3.4.1 track, requirements,
Retrieve users whose address contains Guangzhou and whose name contains Zhang SAN.
3.4.2 MATCH function
The MATCH function is used when performing full-text retrieval.
MATCH(
field_exp,
constant_exp
[, options])
Copy the code
Field_exp: matching field constant_exp: matching constant expression
Rule 3.4.3, implementation,
GET /_sql? Format = TXT {"query":"select * from es_db where MATCH(address, 'gZ ') or MATCH(name, SQL > select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; format=txt { "query":"select age, count(*) as age_cnt from es_db group by age" }Copy the code
This way is more intuitive and concise. Currently, there are some restrictions on Elasticsearch SQL. For example, JOIN is not supported and complex subqueries are not supported. Therefore, some relatively complex functions have to be implemented by MEANS of DSL.
Java API operation ES
Related dependencies:
<dependencies><! ‐ ES high order client API ‐‐><dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>Elasticsearch ‐ rest ‐ high ‐ level ‐ client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>Log4j ‐ core</artifactId>
<version>2.11.1</version>
</dependency><! ‐‐ a library for converting Java objects to JSON and JSON to Java objects by Alibaba<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.62</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.14.3</version>
<scope>test</scope>
</dependency>
</dependencies>
Copy the code
Use Java APIS to operate ES clusters
** The initial connection to ** ** was made using RestHighLevelClient to connect to the ES cluster
public JobFullTextServiceImpl(a) {
// Establish a connection with ES
// 1. Use RestHighLevelClient to build a client connection.
// 2. Build RestClientBuilder based on the restClient. builder method
// 3. Use HttpHost to add the ES node
/* RestClientBuilder = restClient. builder(new HttpHost("192.168.21.130", 9200, "HTTP "), New HttpHost("192.168.21.131", 9200, "HTTP "), new HttpHost("192.168.21.132", 9200," HTTP ")); * /
RestClientBuilder restClientBuilder = RestClient.builder(
new HttpHost("127.0.0.1".9200."http"));
restHighLevelClient = new RestHighLevelClient(restClientBuilder);
}
Copy the code
Add job data to ES
Using IndexRequest objects to describe requests allows you to set the parameters of the request: setting the ID, and setting the data to be transferred from ES – note that since ES uses JSON (DSL) to manipulate data, you need to use a FastJSON library to convert objects to JSON strings for manipulation
@Override
public void add(JobDetail jobDetail) throws IOException {
//1. Construct an IndexRequest object to describe the data from the ES request.
IndexRequest indexRequest = new IndexRequest(JOB_IDX);
//2. Set the document ID.
indexRequest.id(jobDetail.getId() + "");
//3. Use FastJSON to convert entity-class objects to JSON.
String json = JSONObject.toJSONString(jobDetail);
//4. Use the indexRequest. source method to set the document data and set the requested data to JSON format.
indexRequest.source(json, XContentType.JSON);
//5. Use ES High level client to call index to add a document to index.
restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
}
Copy the code
Query/delete/search/paging
Add/delete/modify
@Override
public void add(JobDetail jobDetail) throws IOException {
//1. Construct an IndexRequest object to describe the data from the ES request.
IndexRequest indexRequest = new IndexRequest(JOB_IDX);
//2. Set the document ID.
indexRequest.id(jobDetail.getId() + "");
//3. Use FastJSON to convert entity-class objects to JSON.
String json = JSONObject.toJSONString(jobDetail);
//4. Use the indexRequest. source method to set the document data and set the requested data to JSON format.
indexRequest.source(json, XContentType.JSON);
//5. Use ES High level client to call index to add a document to index.
restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
}
@Override
public JobDetail findById(long id) throws IOException {
// 1. Build the GetRequest request.
GetRequest getRequest = new GetRequest(JOB_IDX, id + "");
// 2. Use resthighLevelClient. get to send a GetRequest request and obtain a response from the ES server.
GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
// 3. Convert the ES response data to a JSON string
String json = getResponse.getSourceAsString();
// 4. Use FastJSON to convert the JSON string into a JobDetail class object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);
// 5. Remember: Set the ID separately
jobDetail.setId(id);
return jobDetail;
}
@Override
public void update(JobDetail jobDetail) throws IOException {
// 1. Check whether the document with the corresponding ID exists
// a) Build GetRequest
GetRequest getRequest = new GetRequest(JOB_IDX, jobDetail.getId() + "");
// b) Run the exists method of the client to initiate a request and check whether the request exists
boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);
if(exists) {
// 2. Build the UpdateRequest request
UpdateRequest updateRequest = new UpdateRequest(JOB_IDX, jobDetail.getId() + "");
// 3. Set the UpdateRequest document to JSON format
updateRequest.doc(JSONObject.toJSONString(jobDetail), XContentType.JSON);
// 4. Run the client command to initiate an update requestrestHighLevelClient.update(updateRequest, RequestOptions.DEFAULT); }}@Override
public void deleteById(long id) throws IOException {
// 1. Build the DELETE request
DeleteRequest deleteRequest = new DeleteRequest(JOB_IDX, id + "");
// 2. Run RestHighLevelClient to execute the delete request
restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
}
Copy the code
The full text retrieval
@Override
public List<JobDetail> searchByKeywords(String keywords) throws IOException {
// 1. Build SearchRequest
// API for full text search and keyword search
SearchRequest searchRequest = new SearchRequest(JOB_IDX);
// 2. Create a SearchSourceBuilder specifically for building query criteria
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
/ / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");
// Set the query criteria to the query request builder
searchSourceBuilder.query(multiMatchQueryBuilder);
// 4. Call searchrequest. source to set the query criteria to the SearchRequest
searchRequest.source(searchSourceBuilder);
/ / 5. Perform RestHighLevelClient. Search by request
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hitArray = searchResponse.getHits().getHits();
// 6. Iterate over the result
ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();
for (SearchHit documentFields : hitArray) {
// 1) Get the result of the hit
String json = documentFields.getSourceAsString();
// 2) Convert the JSON string to an object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);
// 3) Use searchhit. getId to set the document ID
jobDetail.setId(Long.parseLong(documentFields.getId()));
jobDetailArrayList.add(jobDetail);
}
return jobDetailArrayList;
}
Copy the code
Paging query
@Override
public Map<String, Object> searchByPage(String keywords, int pageNum, int pageSize) throws IOException {
// 1. Build SearchRequest
// API for full text search and keyword search
SearchRequest searchRequest = new SearchRequest(JOB_IDX);
// 2. Create a SearchSourceBuilder specifically for building query criteria
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
/ / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");
// Set the query criteria to the query request builder
searchSourceBuilder.query(multiMatchQueryBuilder);
// How many pages to display per page
searchSourceBuilder.size(pageSize);
// set the number from which to start the query
searchSourceBuilder.from((pageNum - 1) * pageSize);
// 4. Call searchrequest. source to set the query criteria to the SearchRequest
searchRequest.source(searchSourceBuilder);
/ / 5. Perform RestHighLevelClient. Search by request
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hitArray = searchResponse.getHits().getHits();
// 6. Iterate over the result
ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();
for (SearchHit documentFields : hitArray) {
// 1) Get the result of the hit
String json = documentFields.getSourceAsString();
// 2) Convert the JSON string to an object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);
// 3) Use searchhit. getId to set the document ID
jobDetail.setId(Long.parseLong(documentFields.getId()));
jobDetailArrayList.add(jobDetail);
}
// 8. Encapsulate the results into a Map structure (with paging information)
// a) total -> use searchhits.getTotalhits ().value to get all records
// b) content -> Data in the current page
long totalNum = searchResponse.getHits().getTotalHits().value;
HashMap hashMap = new HashMap();
hashMap.put("total", totalNum);
hashMap.put("content", jobDetailArrayList);
return hashMap;
}
Copy the code
Use scroll paging to query (deep paging)
- The first query does not contain scroll_id, so you need to set the scroll timeout period
- Do not set the timeout period too short; otherwise, exceptions may occur
- Second query, SearchSrollRequest
@Override
public Map<String, Object> searchByScrollPage(String keywords, String scrollId, int pageSize) throws IOException {
SearchResponse searchResponse = null;
if(scrollId == null) {
// 1. Build SearchRequest
// API for full text search and keyword search
SearchRequest searchRequest = new SearchRequest(JOB_IDX);
// 2. Create a SearchSourceBuilder specifically for building query criteria
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
/ / 3. Use QueryBuilders. MultiMatchQuery build (jd) and search the title, a query condition, and the configuration to the SearchSourceBuilder
MultiMatchQueryBuilder multiMatchQueryBuilder = QueryBuilders.multiMatchQuery(keywords, "title"."jd");
// Set the query criteria to the query request builder
searchSourceBuilder.query(multiMatchQueryBuilder);
// Set the highlight
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.field("jd");
highlightBuilder.preTags("<font color='red'>");
highlightBuilder.postTags("</font>");
// Set the request to highlight
searchSourceBuilder.highlighter(highlightBuilder);
// How many pages to display per page
searchSourceBuilder.size(pageSize);
// 4. Call searchrequest. source to set the query criteria to the SearchRequest
searchRequest.source(searchSourceBuilder);
//--------------------------
// Set scroll query
//--------------------------
searchRequest.scroll(TimeValue.timeValueMinutes(5));
/ / 5. Perform RestHighLevelClient. Search by request
searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
}
// The scroll ID will be used for the second query
else {
SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
searchScrollRequest.scroll(TimeValue.timeValueMinutes(5));
// Use RestHighLevelClient to send a Scroll request
searchResponse = restHighLevelClient.scroll(searchScrollRequest, RequestOptions.DEFAULT);
}
//--------------------------
// Iterate over the ES response data
//--------------------------
SearchHit[] hitArray = searchResponse.getHits().getHits();
// 6. Iterate over the result
ArrayList<JobDetail> jobDetailArrayList = new ArrayList<>();
for (SearchHit documentFields : hitArray) {
// 1) Get the result of the hit
String json = documentFields.getSourceAsString();
// 2) Convert the JSON string to an object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);
// 3) Use searchhit. getId to set the document ID
jobDetail.setId(Long.parseLong(documentFields.getId()));
jobDetailArrayList.add(jobDetail);
// Set some highlighted text to the entity class
// Encapsulates highlights
Map<String, HighlightField> highlightFieldMap = documentFields.getHighlightFields();
HighlightField titleHL = highlightFieldMap.get("title");
HighlightField jdHL = highlightFieldMap.get("jd");
if(titleHL ! =null) {
// Gets the highlighted fragment of the specified field
Text[] fragments = titleHL.getFragments();
// Concatenate the highlighted fragments into a full highlighted field
StringBuilder builder = new StringBuilder();
for(Text text : fragments) {
builder.append(text);
}
// Set it to the entity class
jobDetail.setTitle(builder.toString());
}
if(jdHL ! =null) {
// Gets the highlighted fragment of the specified field
Text[] fragments = jdHL.getFragments();
// Concatenate the highlighted fragments into a full highlighted field
StringBuilder builder = new StringBuilder();
for(Text text : fragments) {
builder.append(text);
}
// Set it to the entity classjobDetail.setJd(builder.toString()); }}// 8. Encapsulate the results into a Map structure (with paging information)
// a) total -> use searchhits.getTotalhits ().value to get all records
// b) content -> Data in the current page
long totalNum = searchResponse.getHits().getTotalHits().value;
HashMap hashMap = new HashMap();
hashMap.put("scroll_id", searchResponse.getScrollId());
hashMap.put("content", jobDetailArrayList);
return hashMap;
}
Copy the code
Highlighting the query
- Configure the highlighting option
// Set the highlight
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.field("jd");
highlightBuilder.preTags("<font color='red'>");
highlightBuilder.postTags("</font>");
Copy the code
- The highlighted fields need to be spliced together and set into the entity class
// 1) Get the result of the hit
String json = documentFields.getSourceAsString();
// 2) Convert the JSON string to an object
JobDetail jobDetail = JSONObject.parseObject(json, JobDetail.class);
// 3) Use searchhit. getId to set the document ID
jobDetail.setId(Long.parseLong(documentFields.getId()));
jobDetailArrayList.add(jobDetail);
// Set some highlighted text to the entity class
// Encapsulates highlights
Map<String, HighlightField> highlightFieldMap = documentFields.getHighlightFields();
HighlightField titleHL = highlightFieldMap.get("title");
HighlightField jdHL = highlightFieldMap.get("jd");
if(titleHL ! =null) {
// Gets the highlighted fragment of the specified field
Text[] fragments = titleHL.getFragments();
// Concatenate the highlighted fragments into a full highlighted field
StringBuilder builder = new StringBuilder();
for(Text text : fragments) {
builder.append(text);
}
// Set it to the entity class
jobDetail.setTitle(builder.toString());
}
if(jdHL ! =null) {
// Gets the highlighted fragment of the specified field
Text[] fragments = jdHL.getFragments();
// Concatenate the highlighted fragments into a full highlighted field
StringBuilder builder = new StringBuilder();
for(Text text : fragments) {
builder.append(text);
}
// Set it to the entity class
jobDetail.setJd(builder.toString());
}
Copy the code