Source: blog.csdn.net/weixin_44671737/article/details/114456257

Abstract

For a company, there is more and more data, and it is a very difficult problem to find that Information quickly, and there is a special field in the computer field called IR (Information Retrival) where you get Information, you do Information retrieval.

Domestic search engines such as Baidu also belong to this field, it is very difficult to implement a search engine, but information search is very important for every company, for developers can also choose some open source projects in the market to build their own site search engine, This article will use ElasticSearch to build one of these information retrieval projects.

1 Technology selection

  • The search engine service uses ElasticSearch
  • Springboot Web is selected as the external Web service

1.1 ElasticSearch

Elasticsearch is a Lucene-based search server. It provides a distributed multi – user – capable full – text search engine based on RESTful Web interface. Elasticsearch, developed in the Java language and released as open source under the Apache license, is a popular enterprise-level search engine. Elasticsearch for cloud computing is stable, reliable, fast, easy to install and use.

Official clients are available in Java,.net (C#), PHP, Python, Apache Groovy, Ruby, and many other languages. Elasticsearch is the most popular enterprise search engine, according to DB-Engines’ rankings, followed by Apache Solr, also based on Lucene. 1

ElasticSearch and Solr are the most common open source search engines in the market. Both of them are based on Lucene. ElasticSearch is more heavyweight and performs better in distributed environment. For small amounts of data, a search engine service such as Lucene is all that is needed to search through a relational database.

1.2 springBoot

Spring Boot makes it easy to create standalone, production-grade Spring based Applications that you can “just run”.2

Now springBoot is the absolute mainstream in web development, which not only has advantages in development, but also has a very good performance in deployment, operation and maintenance. Moreover, the influence of the Spring ecosystem is too great, and various mature solutions can be found.

1.3 IK word divider

ElasticSearch does not support Chinese word segmentation, so you need to install a Chinese word segmentation plug-in. If you need to search for Information in Chinese, select IK. After downloading elasticSearch, put it in the plugin directory where elasticSearch is installed.

2 Environment Preparation

ElastiSearch and Kibana (optional) need to be installed and lk segmentation plugin is required.

  • Install elasticSearch from elasticSearch. I used 7.5.1.
  • Ik plug-in Download github address of ik plug-in. Note that you can download the same IK plugin as you downloaded the ElasticSearch version.
  • Insert the ik plugin into the plugins in the elasticSearch installation directory, create a new ik plugin, extract the ik plugin to the directory, and the plugin will be automatically loaded when you start es.

  • Setup springboot project idea -> New Project -> Spring Initializer

3 Project Architecture

  • Get data using the IK word segmentation plugin
  • Store the data in the ES engine
  • The stored data is retrieved through ES retrieval
  • Java clients using ES provide external services

4 Implementation Effect

4.1 Search Page

Simple implementation of a similar Baidu search box can be.

4.2 Search Results Page

Click the first search result is one of my personal blog posts. In order to avoid data copyright problems, the author stores all personal blog data in ES engine.

5. Specific code implementation

5.1 Implementation object of full-text retrieval

According to the basic information of the blog, the following entity class is defined. The main need is to know the URL of each blog, and to jump to the URL through the specific view of the retrieved article.

package com.lbh.es.entity; import com.fasterxml.jackson.annotation.JsonIgnore; import javax.persistence.*; /** * PUT articles * { * "mappings": * {"properties":{ * "author":{"type":"text"}, * "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * "title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}, * "createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}, * "url":{"type":"text"} * } }, * "settings":{ * "index":{ * "number_of_shards":1, * "number_of_replicas":2 * } * } * } * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * Copyright(c)[email protected] * @author liubinhao * @date 2021/3/3 */ @Entity @Table(name = "es_article") public class ArticleEntity { @Id @JsonIgnore @GeneratedValue(strategy = GenerationType.IDENTITY) private long id; @Column(name = "author") private String author; @Column(name = "content",columnDefinition="TEXT") private String content; @Column(name = "title") private String title; @Column(name = "createDate") private String createDate; @Column(name = "url") private String url; public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getContent() { return content; } public void setContent(String content) { this.content = content; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getCreateDate() { return createDate; } public void setCreateDate(String createDate) { this.createDate = createDate; } public String getUrl() { return url; } public void setUrl(String url) { this.url = url; }}Copy the code

5.2 Client Configuration

Configure the ES client through Java.

package com.lbh.es.config; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestClientBuilder; import org.elasticsearch.client.RestHighLevelClient; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import java.util.ArrayList; import java.util.List; /** * Copyright(c)[email protected] * @author liubinhao * @date 2021/3/3 */ @Configuration public class EsConfig { @Value("${elasticsearch.schema}") private String schema; @Value("${elasticsearch.address}") private String address; @Value("${elasticsearch.connectTimeout}") private int connectTimeout; @Value("${elasticsearch.socketTimeout}") private int socketTimeout; @Value("${elasticsearch.connectionRequestTimeout}") private int tryConnTimeout; @Value("${elasticsearch.maxConnectNum}") private int maxConnNum; @Value("${elasticsearch.maxConnectPerRoute}") private int maxConnectPerRoute; @bean public RestHighLevelClient RestHighLevelClient () {List<HttpHost> hostLists = new ArrayList<>(); String[] hostList = address.split(","); for (String addr : hostList) { String host = addr.split(":")[0]; String port = addr.split(":")[1]; hostLists.add(new HttpHost(host, Integer.parseInt(port), schema)); HttpHost[] HttpHost = hostLists. ToArray (new HttpHost[]{}); RestClientBuilder Builder = restClient. builder(httpHost); / / asynchronous connection delay configuration builder. SetRequestConfigCallback (requestConfigBuilder - > { requestConfigBuilder.setConnectTimeout(connectTimeout); requestConfigBuilder.setSocketTimeout(socketTimeout); requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout); return requestConfigBuilder; }); / / asynchronous connections configuration builder. SetHttpClientConfigCallback (httpClientBuilder - > {httpClientBuilder. SetMaxConnTotal (maxConnNum); httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute); return httpClientBuilder; }); return new RestHighLevelClient(builder); }}Copy the code

5.3 Business code writing

Includes some information for retrieving articles, which can be viewed from the dimensions of article title, article content and author information.

package com.lbh.es.service;

import com.google.gson.Gson;
import com.lbh.es.entity.ArticleEntity;
import com.lbh.es.repository.ArticleRepository;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.io.IOException;

import java.util.*;

/**
 * Copyright(c)[email protected]
 * @author liubinhao
 * @date 2021/3/3
 */
@Service
public class ArticleService {

    private static final String ARTICLE_INDEX = "article";

    @Resource
    private RestHighLevelClient client;
    @Resource
    private ArticleRepository articleRepository;

    public boolean createIndexOfArticle(){
        Settings settings = Settings.builder()
                .put("index.number_of_shards", 1)
                .put("index.number_of_replicas", 1)
                .build();
// {"properties":{"author":{"type":"text"},
// "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}
// ,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
// ,"createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}
// }
        String mapping = "{\"properties\":{\"author\":{\"type\":\"text\"},\n" +
                "\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +
                ",\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +
                ",\"createDate\":{\"type\":\"date\",\"format\":\"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd\"}\n" +
                "},\"url\":{\"type\":\"text\"}\n" +
                "}";
        CreateIndexRequest indexRequest = new CreateIndexRequest(ARTICLE_INDEX)
                .settings(settings).mapping(mapping,XContentType.JSON);
        CreateIndexResponse response = null;
        try {
            response = client.indices().create(indexRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        if (response!=null) {
            System.err.println(response.isAcknowledged() ? "success" : "default");
            return response.isAcknowledged();
        } else {
            return false;
        }
    }

    public boolean deleteArticle(){
        DeleteIndexRequest request = new DeleteIndexRequest(ARTICLE_INDEX);
        try {
            AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
            return response.isAcknowledged();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return false;
    }

    public IndexResponse addArticle(ArticleEntity article){
        Gson gson = new Gson();
        String s = gson.toJson(article);
        //创建索引创建对象
        IndexRequest indexRequest = new IndexRequest(ARTICLE_INDEX);
        //文档内容
        indexRequest.source(s,XContentType.JSON);
        //通过client进行http的请求
        IndexResponse re = null;
        try {
            re = client.index(indexRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return re;
    }

    public void transferFromMysql(){
        articleRepository.findAll().forEach(this::addArticle);
    }

    public List<ArticleEntity> queryByKey(String keyword){
        SearchRequest request = new SearchRequest();
        /*
         * 创建  搜索内容参数设置对象:SearchSourceBuilder
         * 相对于matchQuery,multiMatchQuery针对的是多个fi eld,也就是说,当multiMatchQuery中,fieldNames参数只有一个时,其作用与matchQuery相当;
         * 而当fieldNames有多个参数时,如field1和field2,那查询的结果中,要么field1中包含text,要么field2中包含text。
         */
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders
                .multiMatchQuery(keyword, "author","content","title"));
        request.source(searchSourceBuilder);
        List<ArticleEntity> result = new ArrayList<>();
        try {
            SearchResponse search = client.search(request, RequestOptions.DEFAULT);
            for (SearchHit hit:search.getHits()){
                Map<String, Object> map = hit.getSourceAsMap();
                ArticleEntity item = new ArticleEntity();
                item.setAuthor((String) map.get("author"));
                item.setContent((String) map.get("content"));
                item.setTitle((String) map.get("title"));
                item.setUrl((String) map.get("url"));
                result.add(item);
            }
            return result;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public ArticleEntity queryById(String indexId){
        GetRequest request = new GetRequest(ARTICLE_INDEX, indexId);
        GetResponse response = null;
        try {
            response = client.get(request, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
        if (response!=null&&response.isExists()){
            Gson gson = new Gson();
            return gson.fromJson(response.getSourceAsString(),ArticleEntity.class);
        }
        return null;
    }
}
Copy the code

5.4 External Interfaces

This is the same as developing web applications with SpringBoot.

Spring Boot foundation is not introduced, recommend the actual combat tutorial: github.com/javastacks/…

package com.lbh.es.controller; import com.lbh.es.entity.ArticleEntity; import com.lbh.es.service.ArticleService; import org.elasticsearch.action.index.IndexResponse; import org.springframework.web.bind.annotation.*; import javax.annotation.Resource; import java.util.List; /** * Copyright(c)[email protected] * @author liubinhao * @date 2021/3/3 */ @RestController @RequestMapping("article") public class ArticleController { @Resource private ArticleService articleService; @GetMapping("/create") public boolean create(){ return articleService.createIndexOfArticle(); } @GetMapping("/delete") public boolean delete() { return articleService.deleteArticle(); } @PostMapping("/add") public IndexResponse add(@RequestBody ArticleEntity article){ return articleService.addArticle(article); } @GetMapping("/fransfer") public String transfer(){ articleService.transferFromMysql(); return "successful"; } @GetMapping("/query") public List<ArticleEntity> query(String keyword){ return articleService.queryByKey(keyword); }}Copy the code

5.5 page

Here the page uses Thymeleaf, the main reason is that the author really does not know the front end, only understand a simple H5, just made a page can be displayed.

Search page

<! DOCTYPE html> <html lang="en" xmlns:th="http://www.thymeleaf.org"> <head> <meta charset="UTF-8" /> <meta name="viewport" Content ="width=device-width, initial-scale=1.0" /> <title>YiyiDu</title> <! -- Input :focus Set the blue border to appear when the input box is clicked text-indent: 11px; And padding - left: 11 px; <style> INPUT :focus {border: 2px solid RGB (62, 88, 206); } input { text-indent: 11px; padding-left: 11px; font-size: 16px; } </style> <! <style class=" style/style ">. Width: 33%; height: 45px; vertical-align: top; box-sizing: border-box; border: 2px solid rgb(207, 205, 205); border-right: 2px solid rgb(62, 88, 206); border-bottom-left-radius: 10px; border-top-left-radius: 10px; outline: none; margin: 0; display: inline-block; background: url(/static/img/camera.jpg?watermark/2/text/5YWs5LyX5Y-377ya6IqL6YGT5rqQ56CB/font/5a6L5L2T/fontsize/400/fill/cmVk) no-repeat 0 0; background-position: 565px 7px; background-size: 28px; padding-right: 49px; padding-top: 10px; padding-bottom: 10px; line-height: 16px; } </style> <! <style class=" color: RGB (0, 0, 0); color: RGB (0, 0, 0); width: 130px; vertical-align: middle; text-indent: -8px; padding-left: -8px; background-color: rgb(62, 88, 206); color: white; font-size: 18px; outline: none; border: none; border-bottom-right-radius: 10px; border-top-right-radius: 10px; margin: 0; padding: 0; } </style> </head> <body> <! -- Div containing table --> <! <div style="font-size: 14px; > <div align="center" style="margin-top: 0px;" > <img src=".. /static/img/yyd.png" th: SRC =" @{/static/img/yyd.png}" Alt =" 100mm "width="280px" class=" PIC" /> </div> <div align="center"> <! <form action="/home/query"> <input type="text" class="input" name="keyword" /> <input type="submit" Class = "button" value = "one hundred million degrees" / > < / form > < / div > < / div > < / body > < / HTML >Copy the code

Search results page

<! DOCTYPE html> <html lang="en" xmlns:th="http://www.thymeleaf.org"> <head> <link rel="stylesheet" Href = "https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css" > < meta charset = "utf-8" > <title>xx-manager</title> </head> <body> <header th:replace="search.html"></header> <div class="container my-2"> <ul th:each="article : ${articles}"> <a th:href="${article.url}"><li th:text="${article.author}+${article.content}"></li></a> </ul> </div> <footer th:replace="footer.html"></footer> </body> </html>Copy the code

6 summary

I spent two days studying the following ES. In fact, this device is quite interesting. The basic IR field is still based on statistics, so es has a good performance in the case of big data.

Every time I write actual combat I actually feel some don’t know how to start, because I don’t know what to do? So I also want to get some interesting ideas and I’m going to put them into practice.

Recent hot articles recommended:

1.1,000+ Java Interview Questions and Answers (2021)

2. Awesome! Java coroutines are coming…

3) Too much! Log4j 2.x

4.Spring Boot 2.6 is out with a lot of new features.

5. “Java Development Manual (Songshan version)” the latest release, quick download!

Feel good, don’t forget to click on + forward oh!