This is the 8th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Elasticsearch is based onApache LuceneCreated in 2010 by Elasticsearch NV (nowElastic) first release. According to Elastic, it’s oneDistributed open source search and analysis engine for all types of data, including text, numerical, geospatial, structured and unstructured. The Elasticsearch operation is implemented through the REST API. The main functions are:
- Store documents in indexes,
- Search the index using powerful queries to retrieve these documents, as well
- Run analysis functions on the data.
Spring Data Elasticsearch provides a simple interface to perform these operations on Elasticsearch as an alternative to using the REST API directly. Here, we’ll use Spring Data Elasticsearch to demonstrate Elasticsearch’s indexing and search capabilities, and finally build a simple search application for searching products in the product inventory.
Code sample
Examples of working code on GitHub are attached to this article.
Elasticsearch concept
The easiest way to understand the Elasticsearch concept is to use a database analogy, as shown in the following table:
Elasticsearch | -> | The database |
---|---|---|
The index | -> | table |
The document | -> | line |
The document | -> | column |
Any data we want to search for or analyze is stored in the index as a document. In Spring Data, we represent a document as a POJO and decorate it with annotations to define the mapping to Elasticsearch documents.
Unlike a database, text stored in Elasticsearch is first processed by various analyzers. The default parser splits the text by common word separators, such as Spaces and punctuation, and removes common English words.
If we store The text “The Sky is blue”, The parser stores it as a document containing The “terms” “sky” and “blue”. We will be able to search this document using text in the form of “Blue Sky,” “Sky,” or “blue,” with the degree of match as a score.
Elasticsearch can store other types of data in addition to text, called Field types, as described in the Mapping-types section of the documentation.
Start the Elasticsearch instance
Before we go any further, let’s launch an instance of Elasticsearch that we’ll use to run our example. There are several ways to run an instance of Elasticsearch:
- Using a hosted service
- Use hosted services from cloud providers such as AWS or Azure
- Install Elasticsearch on the vm cluster
- Run the Docker image
We’ll use a Docker image from Dockerhub, which is good enough for our demo application. Let’s start the Elasticsearch instance by running the Docker run command:
Docker run - p \ 9200-9200 - e discovery. Type = "single - node" \ docker elastic. Co/elasticsearch/elasticsearch: 7.10.0Copy the code
Executing this command will start an Elasticsearch instance listening on port 9200. We can verify the instance status by clicking on the URL http://localhost:9200 and check the resulting output in the browser:
{
"name" : "8c06d897d156"."cluster_name" : "docker-cluster"."cluster_uuid" : "Jkx.. VyQ"."version" : {
"number" : "7.10.0". },"tagline" : "You Know, for Search"
}
Copy the code
If our instance of Elasticsearch started successfully, you should see the output above.
Use REST apis for indexing and searching
The Elasticsearch operation is accessed through the REST API. There are two ways to add documents to the index:
- Add one document at a time, or
- Add documents in batches.
The API for adding a single document takes a single document as a parameter.
A simple PUT request to an Elasticsearch instance is used to store the document as follows:
PUT /messages/_doc/1
{
"message": "The Sky is blue today"
}
Copy the code
This stores The message – “The Sky is Blue Today” – as a document in The index of “Messages”.
We can retrieve this document using a search query sent to the search REST API:
GET /messages/search
{
"query":
{
"match": {"message": "blue sky"}}}Copy the code
Here we send a query of type match to get documents that match the string “Blue Sky”. We can specify queries to search documents in a number of ways. Elasticsearch provides a JSON-based query DSL (Domain Specific Language) to define queries.
For batch add, we need to provide a JSON document with entries like the following code snippet:
POST /_bulk
{"index": {"_index":"productindex"}} {"_class":".. Product"."name":"Corgi Toys .. Car"."manufacturer":"Hornby"} {"index": {"_index":"productindex"}} {"_class":".. Product"."name":"CLASSIC TOY .. BATTERY". ."manufacturer":"ccf"}
Copy the code
Use Spring Data for Elasticsearch operations
There are two ways to access Elasticsearch using Spring Data, as shown below:
- Repositories: We define methods in the interface, Elasticsearch queries are generated at runtime from method names.
- ElasticsearchRestTemplate: we use the method of chain and native queries to create queries, in order to better control in a relatively complex scenarios created Elasticsearch queries.
We will examine both approaches in more detail in the following sections.
Create the application and add dependencies
Let’s start by creating our application using Spring Initializr with dependencies that include Web, Thymeleaf, and Lombok. Add the Thymeleaf dependency to increase the user interface.
Add the spring-data-elasticSearch dependency to Maven pop.xml:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-elasticsearch</artifactId>
</dependency>
Copy the code
Connect to Elasticsearch instance
Spring Data Elasticsearch uses the Java High Level REST Client (JHLC) to connect to the Elasticsearch server. JHLC is the default client for Elasticsearch. We’ll create a Spring Bean configuration to set it up:
@Configuration
@EnableElasticsearch
Repositories(basePackages
= "io.pratik.elasticsearch.repositories")@ComponentScan(basePackages = { "io.pratik.elasticsearch" })
public class ElasticsearchClientConfig extends
AbstractElasticsearchConfiguration {
@Override
@Bean
public RestHighLevelClient elasticsearchClient(a) {
final ClientConfiguration clientConfiguration =
ClientConfiguration
.builder()
.connectedTo("localhost:9200")
.build();
returnRestClients.create(clientConfiguration).rest(); }}Copy the code
Here, we connect to the Elasticsearch instance we started earlier. We can further customize the connection by adding more properties, such as enabling SSL, setting timeout, and so on.
For debugging and diagnostics, we will turn on transport-level request/response logging in the logging configuration of logback-spring.xml:
public class Product {
@Id
private String id;
@Field(type = FieldType.Text, name = "name")
private String name;
@Field(type = FieldType.Double, name = "price")
private Double price;
@Field(type = FieldType.Integer, name = "quantity")
private Integer quantity;
@Field(type = FieldType.Keyword, name = "category")
private String category;
@Field(type = FieldType.Text, name = "desc")
private String description;
@Field(type = FieldType.Keyword, name = "manufacturer")
privateString manufacturer; . }Copy the code
Express document
In our example, we will search for products by name, brand, price, or description. Therefore, in order to store the product as a document in Elasticsearch, we represent the product as a POJO and add a Field annotation to configure the mapping of Elasticsearch, as shown below:
public class Product {
@Id
private String id;
@Field(type = FieldType.Text, name = "name")
private String name;
@Field(type = FieldType.Double, name = "price")
private Double price;
@Field(type = FieldType.Integer, name = "quantity")
private Integer quantity;
@Field(type = FieldType.Keyword, name = "category")
private String category;
@Field(type = FieldType.Text, name = "desc")
private String description;
@Field(type = FieldType.Keyword, name = "manufacturer")
privateString manufacturer; . }Copy the code
The @document annotation specifies the index name.
The @ID annotation makes the annotation field the _id of the document as a unique identifier in this index. The ID field has a limit of 512 characters.
The @field annotation configures the type of Field. We can also set the name to a different field name.
An index named ProductIndex is created in Elasticsearch based on these annotations.
Use Spring Data Repository for indexing and searching
The repository provides the most convenient way to access the Data in Spring Data using finder methods. The Elasticsearch query is created based on the method name. However, we must be careful not to generate inefficient queries and load the cluster.
Let’s create a Spring Data repository interface by extending the ElasticsearchRepository interface:
public interface ProductRepository
extends ElasticsearchRepository<Product.String> {}Copy the code
Here the ProductRepository class inherits the Save (), saveAll(), find(), and findAll() methods contained in the ElasticsearchRepository interface.
The index
We will now store a product in the index by calling the save() method and bulk indexing by calling the saveAll() method. Prior to this, we put the repository interface in a service class:
@Service
public class ProductSearchServiceWithRepo {
private ProductRepository productRepository;
public void createProductIndexBulk(final List<Product> products) {
productRepository.saveAll(products);
}
public void createProductIndex(final Product product) { productRepository.save(product); }}Copy the code
When we call these methods from JUnit, we can see the REST API call index and bulk index in the trace log.
search
To satisfy our search requirements, we will add finder methods to the repository interface:
public interface ProductRepository
extends ElasticsearchRepository<Product.String> {
List<Product> findByName(String name);
List<Product> findByNameContaining(String name);
List<Product> findByManufacturerAndCategory
(String manufacturer, String category);
}
Copy the code
When we run the findByName() method with JUnit, we can see the Elasticsearch query generated in the trace log before it is sent to the server:
TRACE Sending request POST /productindex/_search? . : Request body: {.." Query ": {" bool" : {" must ": [{" query_string" : {" query ":" apple ", "fields" : [" name ^ 1.0 "],..}Copy the code
Similarly, by running findByManufacturerAndCategory () method, we can see that using two query_string parameters corresponding to two fields – “manufacturer” and “category” generated by the query:
TRACE .. Sending request POST /productindex/_search.. : Request body: {.." Query ": {" bool" : {" must ": [{" query_string" : {" query ":" samsung ", "fields" : [" manufacturer ^ 1.0 "], and..}}, {" query_string ": {" query" :" Laptop, "" fields" : [" category ^ 1.0 "], and..}}],.... }},"version":true}Copy the code
There are several method naming patterns that can generate various Elasticsearch queries.
Using ElasticsearchRestTemplate index and search
When we need more control over how we design queries, or when the team has mastered Elasticsearch syntax, the Spring Data repository may not be appropriate.
In this case, we use ElasticsearchRestTemplate. It is a new HTTP-based client for Elasticsearch, replacing the TransportClient which used the node-to-node binary protocol.
ElasticsearchOperations ElasticsearchRestTemplate implements the interface, the interface is responsible for the underlying search and cluster fuck multifarious work.
The index
The interface has a method index() for adding a single document and a method bulkIndex() for adding multiple documents to an index. The code snippet here shows how to use bulkIndex() to add multiple products to the index “productIndex” :
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public List<String> createProductIndexBulk
(final List<Product> products) {
List<IndexQuery> queries = products.stream()
.map(product->
new IndexQueryBuilder()
.withId(product.getId().toString())
.withObject(product).build())
.collect(Collectors.toList());;
returnelasticsearchOperations .bulkIndex(queries,IndexCoordinates.of(PRODUCT_INDEX)); }... }Copy the code
The documents to be stored are contained in IndexQuery objects. The bulkIndex() method takes as input a list of IndexQuery objects and the Index name contained in IndexCoordinates. When we execute this method, we get a REST API trace for bulk requests:
Sending request POST /_bulk? timeout=1m with parameters: Request body: {"index":{"_index":"productindex","_id":"383.. 35"}}{"_class":".. Product","id":"383.. 35","name":"New Apple.. phone",.. manufacturer":"apple"} .. {"_class":".. Product","id":"d7a.. 34, ".." manufacturer":"samsung"}Copy the code
Next, we add a single document using the index() method:
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public String createProductIndex(Product product) {
IndexQuery indexQuery = new IndexQueryBuilder()
.withId(product.getId().toString())
.withObject(product).build();
String documentId = elasticsearchOperations
.index(indexQuery, IndexCoordinates.of(PRODUCT_INDEX));
returndocumentId; }}Copy the code
The trace accordingly shows the REST API PUT requests for adding individual documents.
Sending request PUT /productindex/_doc/59d.. 987.. : Request body: {"_class":".. Product","id":"59d.. 87 ",.. ,"manufacturer":"dell"}Copy the code
search
ElasticsearchRestTemplate also has the search () method, is used to search for documents in the index. This search operation is similar to the Elasticsearch Query, which is built by constructing a Query object and passing it to the search method.
Query objects come in three variants – NativeQueryy, StringQuery, and CriteriaQuery, depending on how we construct the Query. Let’s build some queries to search for products.
NativeQuery
NativeQuery provides maximum flexibility for building queries using objects that represent Elasticsearch constructs such as aggregation, filtering, and sorting. This is the NativeQuery used to search for products that match a particular manufacturer:
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public void findProductsByBrand(final String brandName) {
QueryBuilder queryBuilder =
QueryBuilders
.matchQuery("manufacturer", brandName);
Query searchQuery = newNativeSearchQueryBuilder() .withQuery(queryBuilder) .build(); SearchHits<Product> productHits = elasticsearchOperations .search(searchQuery, Product.class, IndexCoordinates.of(PRODUCT_INDEX)); }}Copy the code
Here, we build the query using the creative search query builder, which uses the MatchQueryBuilder to specify a matching query that contains the field “manufacturer.”
StringQuery
StringQuery provides full control by allowing native Elasticsearch queries to be used as JSON strings, as shown below:
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public void findByProductName(final String productName) {
Query searchQuery = new StringQuery(
"{\"match\":{\"name\":{\"query\":\""+ productName + "\"}}} \ ""); SearchHits<Product> products = elasticsearchOperations.search( searchQuery, Product.class, IndexCoordinates.of(PRODUCT_INDEX_NAME)); . }}Copy the code
In this snippet, we specify a simple match query to get a product with a specific name sent as a method parameter.
CriteriaQuery
Using CriteriaQuery, we can build queries without knowing any of the terms of Elasticsearch. The query is built using a chain of methods with Criteria objects. Each object specifies some criteria for searching documents:
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public void findByProductPrice(final String productPrice) {
Criteria criteria = new Criteria("price")
.greaterThan(10.0)
.lessThan(100.0);
Query searchQuery = newCriteriaQuery(criteria); SearchHits<Product> products = elasticsearchOperations .search(searchQuery, Product.class, IndexCoordinates.of(PRODUCT_INDEX_NAME)); }}Copy the code
In this code snippet, we use CriteriaQuery to form queries to get products with prices greater than 10.0 and less than 100.0.
Build the search application
We will now add a user interface to our application to see the product search in action. The user interface will have a search input box for searching products by name or description. The input box will have auto-complete to display a list of suggestions based on available products, as follows:We will create autocomplete suggestions for the user’s search input. The product is then searched based on a name or description that closely matches the search text entered by the user. We’ll build two search services to implement this use case:
- Get autocomplete search suggestions
- Process searches for search products based on the user’s search queries
The service class ProductSearchService will contain methods to search and get suggestions.
Mature applications with user interfaces are available in the GitHub repository.
Build product search indexes
Productindex is the same as the index we used to run JUnit tests. We will first remove productIndex using the Elasticsearch REST API to create a new ProductIndex during application startup using products loaded from our sample dataset of 50 fashion products:
curl -X DELETE http://localhost:9200/productindex
Copy the code
If the deletion is successful, we receive the message {“acknowledged”: true}.
Now, let’s create an index for the products in inventory. We will use a sample dataset of 50 products to build our index. These products are arranged as separate lines in the CSV file.
Each row has three attributes – ID, name, and Description. We want to create indexes during application startup. Note that in a real production environment, index creation would be a separate process. We will read each line of the CSV and add it to the product index:
@SpringBootApplication
@Slf4j
public class ProductsearchappApplication {...@PostConstruct
public void buildIndex(a) {
esOps.indexOps(Product.class).refresh();
productRepo.saveAll(prepareDataset());
}
private Collection<Product> prepareDataset(a) {
Resource resource = new ClassPathResource("fashion-products.csv"); .returnproductList; }}Copy the code
In this fragment, we do some pre-processing by reading rows from the dataset and passing them to the saveAll() method of the repository to add products to the index. When running the application, we can see the following trace log in the application startup.
. Sending request POST /_bulk? timeout=1m with parameters: Request body: {"index":{"_index":"productindex"}}{"_class":"io.pratik.elasticsearch.productsearchapp.Product","name":"Hornby 2014 Catalogue","description":"Product Desc.. talogue","manufacturer":"Hornby"}{"index":{"_index":"productindex"}}{"_class":"io.pratik.elasticsearch.productsearchapp. Product","name":"FunkyBuys.." ,"description":"Size Name:Lar.. & Smoke","manufacturer":"FunkyBuys"}{"index":{"_index":"productindex"}}. ...Copy the code
Use multi-field and fuzzy search to search for products
Here’s how we handle the search request when we submit it in the method processSearch() :
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
private ElasticsearchOperations elasticsearchOperations;
public List<Product> processSearch(final String query) {
log.info("Search with query {}", query);
// 1. Create query on multiple fields enabling fuzzy search
QueryBuilder queryBuilder =
QueryBuilders
.multiMatchQuery(query, "name"."description")
.fuzziness(Fuzziness.AUTO);
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(queryBuilder)
.build();
// 2. Execute search
SearchHits<Product> productHits =
elasticsearchOperations
.search(searchQuery, Product.class,
IndexCoordinates.of(PRODUCT_INDEX));
// 3. Map searchHits to product list
List<Product> productMatches = new ArrayList<Product>();
productHits.forEach(searchHit->{
productMatches.add(searchHit.getContent());
});
return productMatches;
}...
}
Copy the code
Here, we perform a search on multiple fields – name and description. We also added fuzziness() to search for tightly matched text to explain spelling errors.
Use wildcard search to get suggestions
Next, we build autocomplete for the search text box. When we enter content in the search text field, we will get suggestions by performing a wildcard search using the characters entered in the search box.
We build this function in the fetchSuggestions() method, as follows:
@Service
@Slf4j
public class ProductSearchService {
private static final String PRODUCT_INDEX = "productindex";
public List<String> fetchSuggestions(String query) {
QueryBuilder queryBuilder = QueryBuilders
.wildcardQuery("name", query+"*");
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(queryBuilder)
.withPageable(PageRequest.of(0.5))
.build();
SearchHits<Product> searchSuggestions =
elasticsearchOperations.search(searchQuery,
Product.class,
IndexCoordinates.of(PRODUCT_INDEX));
List<String> suggestions = new ArrayList<String>();
searchSuggestions.getSearchHits().forEach(searchHit->{
suggestions.add(searchHit.getContent().getName());
});
returnsuggestions; }}Copy the code
We use wildcard queries in the form of search input text, and append * so that if we type “red”, we get suggestions that start with “red”. We limit the suggested number to 5 using the withPageable() method. You can see some screenshots of the search results for a running application here:
conclusion
In this article, we introduced the main operations of Elasticsearch — indexing documents, bulk indexing, and search — which are provided as REST apis. The combination of Query DSL with different analyzers makes searching very powerful.
Spring Data Elasticsearch through the use of Spring Data Repositories or ElasticsearchRestTemplate provides a convenient interface to access to the application of these operations.
We ended up building an application where we saw how to use Elasticsearch’s bulk indexing and search capabilities in a near-real life application.
- Motoring Using Elasticsearch with Spring Boot-reflecing
- For ELK suite please refer to: ELK Tutorial – Discover, Analyze and Visualize Your Data