Get started with ElasticSearch
ElasticSearch
1.1 What is ElasticSearch?
- ElasticSearch, referred to as ES, is an open source and highly extensible distributed full text search software. It can store and retrieve data in near real time.
- It scales well to hundreds of servers and handles petabytes of data. Es is also developed in Java and uses Lucene as its core to implement all indexing and searching functions.
- It aims to make full-text search easy by hiding Lucene’s complexity with a simple RESTful API.
1.2 Its comparison with Solr?
- Solr uses Zookeeper for distributed management, while ElasticSearch provides distributed coordination management.
- Solr supports more data formats, while ElasticSearch only supports JSON files.
- Solr officially offers more features, while ElasticSearch itself focuses on core features and relies on third-party plugins for advanced features.
- Solr performs better than ElasticSearch in traditional search applications, but is significantly less efficient than ElasticSearch in real-time search applications.
1.3 installation ElasticSearch
- Download url for ElasticSearch
- It is in the bin directory
windows
andlinux
Run the startup command. - Start (The runtime environment must first install Java, then the version must be greater than or equal to JDK1.8, which runs on Java.) Click in the bin directory
elasticsearch
In the Linux operating system, select the sh running file. As shown in the figure: - After the server is successfully started, port 9300 is the Api management port for the server, and port 9200 is a restful HTTP port. After successful startup, the following figure is shown:
- Page to visit:
1.4 Installing the ES GUI Plug-in
- Introduction:
- For ElasticSearch, you can install the head plug-in for ElasticSearch to view index data on the ElasticSearch GUI.
- There are two ways to install plug-ins: online installation and local installation.
- This blog uses local installation to install the HEAD plug-in. Install node and Grunt for ElasticSearch -50* or later
- Installation steps:
- Download the Head plugin: github.com/mobz/elasti…
- To install Node.js, refer to the blogger’s previous posts or online tutorials. Download Node. js from the official website, install it as prompted, and enter
node -v
View the version and check whether the installation is successful.
It runs on Node.js, so we need to install it.
- To install grunt as a global command, grunt is a Node.js-based project builder, enter the following command on the CMD console:
npm install -g grunt-cli Copy the code
- Go to elasticsearch-head-master and start head at the command prompt:
npm install Copy the code
grunt server Copy the code
Head refers to the head plugin you just downloaded.
- After successful startup, the following figure is shown:
- Webpage access is as shown in the figure below:
- Because the two ports involve cross-domain configuration, add cross-domain configuration in config/ elasticSearch.yml:
- Once you have added the cross domain, restart ES, and then re-click Connect on head to complete the graphical installation.
1.5 ElasticSearch Related Concepts (Terminology)
- Summary:
- ElasticSearch is document oriented, which means it can store entire objects or documents.
- It doesn’t just store, it indexes the contents of each document to make it searchable. In Elasticsearch you can index, search, sort, and filter documents instead of rows and columns of data.
- ElasticSearch compares to traditional relational databases as follows:
Relational DB -> Databases -> Tables -> Rows -> Columns ElasticSearch -> Indices -> Types -> Documents -> Fields Copy the code
- ElasticSearch
- The Index Index:
- An index is a collection of documents with somewhat similar characteristics.
- For example, you could have an index of customer data, another index of product catalogs, and another index of order data.
- An index is identified by a name (which must be all lowercase) and is used when indexing, searching, updating, and deleting documents corresponding to the index.
- In a cluster, you can define as many indexes as you want.
- Type the type:
- In an index, you can define one or more types.
- A type is a logical classification/partition of your index, and its semantics are entirely up to you.
- Typically, a type is defined for documents that have a common set of fields. For example, let’s say you run a blogging platform and store all your data in an index.
- In this index you can define one type for user data, another type for blog data, and of course, another type for comment data.
- Field in the Field:
- It is equivalent to the field of the data table, and classifies the document data according to different attributes.
- Mapping mapping
- Mapping is the manner in which process the data and the rules of some limitations, such as a field, default values, the data type of the analyzer, whether to be indexed and so on, these are the mapping can be set inside, the other is the handling es inside some of the data using rules set also called mapping, according to the optimal rule processing data for performance improvement is very big, That’s why you need to set up mappings, and you need to think about how to set up mappings for better performance.
- Document is the document
- A document is a basic unit of information that can be indexed. For example, you can have a document for a customer, a document for a product, and, of course, a document for an order. Documents are presented in JSON (javascript Object Notation) format, a ubiquitous format for data interaction on the Internet.
- You can store as many documents as you want in an index/type. Note that although a document physically exists in an index, the document must be indexed/given an index type.
- Near real-time NRT
- ElasticSearch is a near real-time search platform. This means that there is a slight delay (typically less than 1 second) from indexing a document until it can be searched.
- Cluster cluster:
- A cluster is an organization of one or more nodes that collectively hold the entire data and together provide indexing and search capabilities. A cluster is identified by a unique name, which defaults to “ElasticSearch”. This name is important because a node can only be added to a cluster by specifying its name.
- Node node:
- A node is a server in a cluster that, as part of the cluster, stores data and participates in the index and search functions of the cluster. Like clusters, a node is identified by a name, which by default is the name of a random Marvel character that is assigned to the node at startup. This name is important for administration, as you will determine which servers in the network correspond to which nodes in the ElasticSearch cluster.
- A node can be added to a specified cluster by configuring the cluster name. By default, each node is arranged to join a cluster called “ElasticSearch”, which means that if you start several nodes in your network and assume they can find each other, they will automatically form and join a cluster called ElasticSearch.
- In a cluster, you can have as many nodes as you want. Also, if you don’t have any ElasticSearch nodes currently running on your network, starting a node will create and join a cluster called ElasticSearch by default.
- shards&replicas
- The Index Index:
ElasticSearch (ElasticSearch
2.1 Creating An Index
-
Create indexes using a graphical interface:
- Create something like this:
- After the index is created, click overview to view the index:
- Create with query statement:
-
Create index blog with postman;
- Here is:
- After it is created successfully, click overview to find the new oneblogIndex created successfully:
-
use
mapping
Create index with field: -
Set the Mapping information after creating the index (set the Mapping for the specified type after creating type) :
2.2 Deleting an Index
-
Graphical interface directly delete:
- Direct icon deletion:
- Query statement delete:
-
Delete index with postman:
Before adding an index, it was a PUT request. Therefore, if the index is deleted, the request will be changed to DELETE. Then, the specified index name will be deleted.
2.3 Adding a Document
- HTTP to add:
- HTTP to add:
- View the added data:
- Note: The document ID may not be specified. If it is null, es will automatically assign us a random string:
- The generation looks like this:
- The string looks like this:
- Head graphical interface added:
- Add as shown:
- useData browsingVerify whether field filtering is added successfully:
If the data to be queried is displayed, the data is added successfully.
2.4 Deleting a Document
- Request mode:
delete
, and then provided in the URLThe index
,type
,Document id
, can be deleted. The illustration is as follows:
Delete the _id, not the id in the content. The _id is the actual primary key of the document.
2.5 Modifying a Document
- Here is:
Modifying a document is the same as adding it. If the _id data already exists, the old data is deleted and new data is added. If the id does not exist, the id is added.
2.6 Querying Documents By Document ID
Index /type/field is specified, so the specified document data can be queried.
2.7 Querying Documents Based on Keywords
- Query:
- View the result:
2.8 Querying documents -queryString Query
- The method is shown as follows:
The query condition is arbitrary, and default_field refers to the default search field. It classifies the query content of the query first and then queries it on the default search domain.
2.9 Query using the Head plug-in
2.10 words
-
Standard participle:
You can see that it is a Chinese word divided into one word for each word, which is not the effect we want. The effect we need is: me, yes, program, programmer.
-
How to solve it?
- We need support for Chinese segmentation with good support. There are many participles supporting Chinese word analyzer, Word analyzer, Paoding Analyzer, Pangu analyzer, ANSJ analyzer, etc., but the IK analyzer is the most commonly used one.
-
IK word divider
- Here is:
- Installation steps:
- Download Ik word segmentation, unzip it, put it in plugin folder in ES, and restart es service.
-
Effect of using IK word divider:
- Overview: IK provides two word segmentation algorithms ik_smart and IK_max_word. Ik_smart is the least sharding, ik_MAX_word is the most fine-grained sharding.
- Minimum segmentation: Enter the address in the browser:
http:/ / 127.0.0.1:9200 / _analyze? Analyzer = IK_smart & Pretty = True&Text = I'm a programmer Copy the code
I am a program programmer
- Finer-grained sharding: Enter the address in the browser:
http:/ / 127.0.0.1:9200 / _analyze? Analyzer = IK_MAX_word &text= I'm a programmer Copy the code
I am program programmer member. It shows that its resolution results are finer and word segmentation results are more.
-
Use of IK word dividers:
- Create index library:
- Add data:
- Query data:
- Query by condition:
- The querystring query:
- Use the search in head:
ElasticSearch cluster
3.1 ElasticSearch Architecture and Description
- Summary:
- Cluster architecture description:
3.2 Cluster Architecture Construction
- Preparations:
- Delete the data directory :(if the standalone version of es is upgraded to the cluster, we first check the data directory to ensure that there is no data in the data directory (create an index to save data information))
If the cluster is not deleted, the cluster fails to be set up
- Delete the data directory :(if the standalone version of es is upgraded to the cluster, we first check the data directory to ensure that there is no data in the data directory (create an index to save data information))
- Change the es of our standalone version to ElasticSearch-Cluster, configure ik word segmentation, CORS cross-domain configuration, and then duplicate three copies, as shown in the figure: >
- Modify the elasticsearch cluster/node * / config/elasticsearch yml file:
- Node1 node:
- 2 nodes:
- Node3 nodes:
The node name and port number are configured.
- Start each node separately, and then it will automatically connect to become a cluster, connect any port for verification:
- Create index library:
- Create index:
- Query fragment verification:
4. Use Java client to simply operate ES
4.1 Creating an index database and adding documents using a Java client
- General steps:
- Create a Java project
- Write the JAR package to add maven coordinates
- Write test method implementation create index library:
- Create a Settings object, which is equivalent to a configuration message. Name of the main configuration cluster
- Create a Client Client object
- Create an index library using the Client object
- Closing the Client object
- The steps are as follows:
-
Es dependencies introduced in POM.xml:
-
Create test class and create index library:
public class ElasticSearchClientTest{ @Test public void createIndex(a) throws Exception{ //1. Create a Settings object, which is equivalent to configuration information. Name of the main configuration cluster. Settings settings= Settings.builder() .put("cluster.name"."my-elasticsearch") .build(); Create a Client object TransportClient client= new PrebuiltTransportClient(settings); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9301)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9302)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9303)); // 3. Create an index library using the client object client.admin().indices().prepareCreate("index_hello")'// perform the operation.get (); // 4. Close the client object client.close(); }}Copy the code
-
4.2 Setting Mappings Using a Java Client
- General steps:
- Create a Settings object
- Create a Client object
- Create a Mapping message, which should be a JSON data, either a string or an XContextBuilder object
- Use client to send mapping information to the ES server
- Closing the Client object
- Create a new Test method in the Test class above:
@Test public void setMappings(a) throws Exception{ //1. Create a Settings object, which is equivalent to configuration information. Name of the main configuration cluster. Settings settings= Settings.builder() .put("cluster.name"."my-elasticsearch") .build(); Create a Client object TransportClient client= new PrebuiltTransportClient(settings); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9301)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9302)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9303)); // 3. Create a Mappings information XContentBuilder builder = XContentFactory.jsonBuilder() .startObject() .startObject("article") .startObject("properties") .startObject("id") .field("type"."long") .field("store".true) .endObject() .startObject("title") .field("type"."text") .field("store".true) .field("analyzer"."ik_smart") .endObject() .startObject("content") .field("type"."text") .field("store".true) .field("analyzer"."ik_smart") .endObject() .endObject() .endObject() .endObject() // Use client to set the mapping information to the index library client.admin().indices() // Set the index to be mapped .preparePutMapping("index_hello") // Set the mapping type .setType("article") // Mapping information. The XContentBuilder object can be a string in JSON format .setSource(builder) // Perform the operation .get(); // Close the connection client.close(); } Copy the code
4.3 Adding documents Using a Java Client
- General steps:
- Create a Settings object
- Create a Client object
- Create a document object, create a JSON-formatted string, or use XContentBuilder
- Use the Client object to add documents to the index library
- Close the Client
- Operation steps:
- We need to create Settings and Client objects each time, so we’ll propose this operation:
public class ElasticSearchClientTest{ private TransportClient client; @Before public void init(a) throws Exception{ // Create a Settings object Settings settings = Settings.builder().put("cluster.name"."my-elasticsearch").build(); // Create a TransPortClient object TransportClient client= new PreBuiltTransportClient(settings); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9301)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9302)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9303)); }}Copy the code
This will automatically init the contents of the client before the class is instantiated.
- Create a document:
@Test public void testAddDocument(a) throws Exception{ Create a document object XContentBuilder builder = XContentFactory.jsonBuilder() .startObject() .field("id".1l) .field("title".This is the title of the data) .field("content"."This is the content of the data.") .endObject(); // Add the document object to the index library client.prepareIndex() // Set the index name .setIndex("index_hello") / / set type .setType("article") // Set the id of the document, otherwise one will be generated automatically .setId("id") // Set the document information .setSource(builder) // Perform the operation .get(); // Close the client client.close(); } Copy the code
- We need to create Settings and Client objects each time, so we’ll propose this operation:
- The second way to create a document (using objects to create it) :
- Introducing jackson-related dependencies in POM.xml:
- Create a POJO class:
- Add documents (use utility classes to convert POJOs to JSON strings, then write documents to the index library)
5. Use Java client to realize the es search function
5.1 Querying Information By ID
- General steps:
- Create a Client object
- To create a query object, you can create a QueryBilder object using the QueryBuilders utility class.
- Use client to perform the query
- Get the results of the query
- Obtain the total number of records in the query result
- Obtain the list of query results
- Close the client
- Code demo:
public class SearchIndexTest{ private TransportClient client; @Before public void init(a) throws Exception{ // Create a Settings object Settings settings = Settings.builder().put("cluster.name"."my-elasticsearch").build(); // Create a TransPortClient object TransportClient client= new PreBuiltTransportClient(settings); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9301)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9302)); client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"),9303)); } @Test public void testSearchById(a) throws Exception{ // Create a query object QueryBuilder queryBuilder = QueryBuilders.idsQuery().addIds("1"."2"); // Execute the query SearchResponse searchResponse = client.prepareSearch("index_hello") .setTypes("article") .setQuery(queryBuilder) .get(); // Get the query result SearchHits searchHits = searchResponse.getHits(); // Get the total number of records of the query result System.out.println("Total number of query results:"+ searchHits.getTotalHits()); // Query result list Iterator<SearchHit> iterator = searchHits.iterator(); while(iterator.hasNext()){ SearchHit searchHit = iterator.next(); // Print the document object in JSON format System.out.println(searchHit.getSourceAsString()); // Take the attributes of the document Map<String,Object> document = searchHit.getSource(); System.out.println(document.get("id0")); System.out.println(document.get("title")); System.out.println(document.get("content")); }}}Copy the code
5.2 Query By Item
@Test
public void testQueryByTerm(a) throws Exception{
Create a QueryBuilder object
// Parameter 1: the field to search
// Parameter 2: keyword to search
QueryBuilder queryBuilder = QueryBuilders.termQuery("title"."The north")
// Execute the query
SearchResponse searchResponse = client.prepareSearch("index_hello")
.setTypes("article")
.setQuery(queryBuilder)
.get();
// Get the query result
SearchHits searchHits = searchResponse.getHits();
// Get the total number of records of the query result
System.out.println("Total number of query results:"+ searchHits.getTotalHits());
// Query result list
Iterator<SearchHit> iterator = searchHits.iterator();
while(iterator.hasNext()){
SearchHit searchHit = iterator.next();
// Print the document object in JSON format
System.out.println(searchHit.getSourceAsString());
// Take the attributes of the document
Map<String,Object> document = searchHit.getSource();
System.out.println(document.get("id0"));
System.out.println(document.get("title"));
System.out.println(document.get("content")); }}Copy the code
5.3 Querying Information Based on queryString
@Test
public void testQueryStringQuery(a) throws Exception {
Create a QueryBuilder object
QueryBuilder queryBuilder = QueryBuilders.queryStringQuery("Fast and Furious").defaultField("title");
// Execute the query
SearchResponse searchResponse = client.prepareSearch("index_hello")
.setTypes("article")
.setQuery(queryBuilder)
.get();
// Get the query result
SearchHits searchHits = searchResponse.getHits();
// Get the total number of records of the query result
System.out.println("Total number of query results:"+ searchHits.getTotalHits());
// Query result list
Iterator<SearchHit> iterator = searchHits.iterator();
while(iterator.hasNext()){
SearchHit searchHit = iterator.next();
// Print the document object in JSON format
System.out.println(searchHit.getSourceAsString());
// Take the attributes of the document
Map<String,Object> document = searchHit.getSource();
System.out.println(document.get("id0"));
System.out.println(document.get("title"));
System.out.println(document.get("content")); }}Copy the code
5.4 Paging Query
- General steps:
- Before the client object performs the query, set the paging information.
- Then execute the query:
// Execute the query SearchResponse searchResponse = client.prepareSearch("index_hello") .setTypes("article") .setQuery(queryBuilder) // Set paging information .setFrom(0) // The number of lines displayed per page .setSize(5) .get(); Copy the code
- Note:
- Paging requires setting two values, from and size
- From: Start line number, starting from 0.
- Size: indicates the number of records displayed per page
- Code demo:
5.5 The query results are highlighted
- General steps:
- Highlighted configuration:
- Sets the highlighted fields
- Set the highlighted prefix
- Set the suffix to highlight
- Set the information highlighted before the client object executes the query.
- You can take highlighted results from the results as you traverse the list of results.
- Highlighted configuration:
- Code demo:
-
New highlight object:
-
Querying the highlighting information in the client:
-
Print the highlighted message when you get the query result:
-
For more detailed highlighting (retrofit Step 3) :
-
Select ES from Spring Data ElasticSearch
5.1 Setting up the Spring Data ElasticSearch Environment
-
What is Spring Data?
- Spring Data is an open source framework for simplifying database access and enabling cloud services. Its main goal is to make data access easy and fast, and to support the Map-Reduce framework and cloud computing data services. Spring Data greatly simplifies JPA writing, allowing access to and manipulation of Data with little to no written implementation. In addition to CRUD, common functions such as paging and sorting are included.
- Spring Data’s official website
- The common functional modules of Spring Data are as follows:
-
What is Spring Data ElasticSearch?
- Spring Data ElasticSearch simplifies ElasticSearch operations based on the Spring Data API and encapsulates the original ElasticSearch client API. Spring Data provides an integrated search engine for ElasticSearch projects. Spring Data ElasticSearch POJO’s key feature area centered model interacts with ElasticSearch, making it easy to write a storage Data access layer;
- The official website
-
Setting up steps:
- Spring Data ElasticSearch:
- Solution 1: Create a configuration file in the Resource directory and configure elasticSearch.
- In application. Yml:
spring: application: name: es-application data: elasticsearch: cluster-nodes: 192.168227.136.:9300 cluster-name: elasticsearch Copy the code
Recommend this plan
- Create entity class:
- Create a Repository:
It inherits ElasticSearchRepository, which provides some common methods
5.4 Complete basic add, Delete, modify, and query operations of Spring Data ElasticSearch
-
Create index:
-
Add a document:
-
Delete document:
-
Update document:
It will delete the original document matching the ID, and then add it again, to achieve a similar update function.
-
Query documents with specified ID and query all documents:
5.5 Customizing Query Methods
- Common query naming rules:
- FindByTitle according to the rule definition:
- Definition method:
- Query code:
- According to the rules, define the or query mode:
- Definition method:
- Query code:
- Define the OR query mode according to the rules with paging query:
- Definition method:
- Query code:
Custom query methods, we need to be named according to the SpringDataES naming rules, so that we define the interface will automatically implement the function. If we hadn’t implemented paging, no matter how many matches there were in ES, we would have been returned at most 10 by default. If you need to specify paging, you can use the paging query method above.
5.6 Using NativeSearchQuery to Query Information
-
Using a query like the one in Section 5.5 that says “I am a programmer “, it will no longer split the result into words, so it will match the atomic term” I am a programmer “. If you want to query the condition of word segmentation, such as I am a programmer word segmentation, can match: I, is, program, programmer, all data, we need to use queryString query.
-
General steps:
- Create a NativeSearchQuery object, set the query criteria, QueryBuiilder object
- Perform the query using the ElasticSearchTemplate object
- Fetch query result
-
Code demo:
@Test public void testNativeSearchQuery(a) throws Exception{ // Create a query object NativeSearchQuery query = new NativeSearchQueryBuilder() .withQuery(QueryBuilders.queryStringQUery("Maven is a build tool").defaultField("title")) .withPageable(PageRequest.of(0.15)) .build(); // Execute the query List<Article> articleList = template.queryForList(query,Article.class); articleList.forEach(a-> System.out.println(a)) } Copy the code
Slightly more cumbersome, but more functional and flexible