Using ES+Logstash to realize article highlighting search and Mysql data synchronization

In this paper, a community project source code: the front-end | backend

Lemon C community project is designed and developed with reference to nuggets community, if you feel good, welcome Star support, thank you.

I will Review the functions as soon as I have time. Please pay attention to the following articles ~

– One item highlighted one item from the community was recently implemented with ElasticSearch+Logstash (●’◡’●)! Open, oh

Still, there were many trails along the way (though treading trails is the fastest way to grow.)

Let’s take a look at the implementation (the GIF is a little fuzzy, but it looks ok).

As can be seen from the figure, we search the article by keywords, and the title and content of the article are highlighted with corresponding keywords.

So without further ado, one item is exactly one item for a smooth finish (●’◡’).

0. Make preparations

– We need to install ElasticSearch (ik word splitters) + Logstash + ElasticSearch-Header

(The installation process is not described here, just search on the Internet, but I will pick out the details.)

Both ES and Logstash used in this article are version 7.6.2 (mainly used with Springdata-es (the latest version of Springdata-es supports ES 7.6.2))

1. The back end

1.1 Introducing dependencies

The back-end project uses SpringBoot(2.3.0), which requires importing some core dependencies

<dependencies>

···· Others must depend on

    <dependency>

        <groupId>org.springframework.data</groupId>

        <artifactId>spring-data-elasticsearch</artifactId>

    </dependency>

</dependencies>

Copy the code

1.2 Article entity class

This article entity class is the specific data that we search for.

The fields of the entity class are as follows (title and detail and createdTime are the main searches)

Id Numbers
The title title
The detail content
CreatedTime Creation time
UpdatedTime Latest update time
… Author IDS, views, likes, logical delete fields, etc

(including,id,createdTime,isDeletedIt’s all inheritedBaseEntityThe inside)

Here are the annotations for springdata-es used in the entity class code:

@Document(indexName is the name of the index we created, type is no longer needed)

@id marks the primary key (placed above the Id)

@field (Type is the type of this Field, Analyzer and searchAnalyzer are the word segmentation rules, and format is the time format)

Since you need to search for keywords from title and detail, type is written as text, which allows for word segmentation.

About analyzer and searchAnalyzer, I see this explained in the official documentation.

So, I’m guessing both items point to the same item, just one item for one item: ‘◡’.

The fields updatedTime and createdTime are highlighted here

When mysql data is synchronized to ES, the time data format is similar to YYYY-MM-DD ‘T’HH: MM: ss.sssz.

So, at sign Field we need to declare the time format

@Field(type = FieldType.Date, format = DateFormat.date_optional_time)

Using the JsonFormat annotation, we can make the time data obtained by the front-end call interface into the desired format, such as 2020-06-10 08:08:08, and declare the time zone.

@JsonFormat(shape=JsonFormat.Shape.STRING,pattern="yyyy-MM-dd HH:mm:ss",timezone="GMT+8")

Entity class code:

@LombokThe annotation of...

@Document(indexName = "article",type = "_doc")

public class Article extends BaseEntity {



    // Use the ik toggle to maximize word segmentation

    @Field(type = FieldType.Text, analyzer = "ik_max_word" ,searchAnalyzer="ik_max_word")

    private String title;



    @Field(type = FieldType.Text,analyzer = "ik_max_word" ,searchAnalyzer="ik_max_word")

    private String detail;



    / / the author id



    // Likes, views, etc



    / * *

* Modify time

* /

    @JsonFormat(shape=JsonFormat.Shape.STRING,pattern="yyyy-MM-dd HH:mm:ss",timezone="GMT+8")

    @Field(type = FieldType.Date, format = DateFormat.date_optional_time)

    public Date updatedTime;





    // Id, createdTime, and isDeleted are in the inherited BaseEntity

}

Copy the code

1.3 Creating Indexes and Mappings

First open ElasticSearch

Then introduced in the test class ElasticsearchRestTemplate, later we will use this class ES highlighting query.

ElasticsearchRestTemplate is spring – data – a class of elasticsearch programs, and other similar to the template in the spring project. Based on RestHighLevelClient, if RestHighLevelClient is not manually configured, the IP + port is localhost:9200 by default

@Autowired

ElasticsearchRestTemplate ESRestTemplate;

Copy the code

To write a test method, run these two lines of code.

// Create the corresponding index according to the annotation in our Article

// Create the corresponding mapping according to the annotations in our Article

ESRestTemplate.indexOps(Article.class);

// Delete index

// ESRestTemplate.indexOps(Article.class).delete(); Can be

Copy the code

Open the elasticSearch-header and see that the index is created

In the meantime, we can look at whether the specific mapping is the way it’s written in our annotations.

Ha ha, found exactly the same.

OK，No problems (●’◡’●) ，Let’s go next !

Now it’s time to use the Logstash synchronization.

1.4 Use ‘Logstash’ to synchronize Mysql data

With the index and mapping set up, we need to synchronize Mysql data to ElasticSearch

(To be honest, there are a lot of problems with the Logstash synchronization, which causes my card to be stuck for a long time. It’s really a little uncomfortable. Qaq)

We need to use the Logstash plugin Logstash -input- JDBC for data synchronization.

The logstash7.x version does not come with the logstash-input-JDBC plugin, so you need to install it manually, but I think I can run it directly. ..)

First, open the Logstash bin directory and write a configuration file named mysql.conf.

(This configuration file is key, it is used to synchronize data)

I’ve annotated what basic configuration means.

I wrote *DIY at the beginning of the notes where I needed to modify myself

The data synchronization rule is the SQL statement specified by ourselves. I use updated_time as the basis for the synchronization judgment. Only the data in the range of the last synchronization value

input {

  jdbc {

    # *DIY mysql connection driver address, this is optional, fill in the correct line

    jdbc_driver_library => "C: \ Users \ Masics \ Desktop \ logstash - 7.6.2 \ lib \ mysql connector - Java - 8.0.19. Jar"



    # *DIY driver class name

    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"



    # *DIY 8.0 + : Be sure to add serverTimezone=UTC days

    jdbc_connection_string => "jdbc:mysql://localhost:3306/lemonc? useSSL=false&&serverTimezone=GMT%2B8&rewriteBatchedStatements=true&useUnicode=true"



    # *DIY username and password

    jdbc_user => "root"

    jdbc_password => "123456"



    # *DIY Settings Listen interval Each field meaning (from left to right) minute, hour, day, month, year, all * default meaning every minute update

    schedule => "* * * * *"



    # *DIY SQL execution statement (remember to check the case of the field is the same as in the map!!)

    Because ES uses UTC, 8 hours earlier than Beijing time, so ES needs to set the last update time +8 hours when reading data

    statement => "SELECT id ,title,detail,created_time as createdTime,updated_time as updatedTime

    FROM article where updated_time > date_add(:sql_last_value,INTERVAL 8 HOUR)  AND updated_time < NOW()"



    # index type

    type => "_doc"



    Should the field name be lowercase (if true, createdTime will become createdTime and an error will be reported)

    lowercase_column_names => false



    Whether to record the last run

    record_last_run => true



    # whether to use column elements

    use_column_value => true



    The element name of the trace corresponds to the field name stored above es, not the database field name

    tracking_column => "updatedTime"



    # number (timestamp)

    tracking_column_type => "timestamp"



    # *DIY Settings record path

    last_run_metadata_path => "C: \ Users \ Masics \ Desktop \ logstash - 7.6.2 \ config \ last_metadata"



    Whether the last synchronization point is cleared with each run

    clean_run => "false"

  }

}





output {

    elasticsearch {

        # * IP address and port of DIY ES

        hosts => ["localhost:9200"]

        # *DIY index name

        index => "article"

        SQL > select * from database where id = 'id'

        document_id => "%{id}"

        # index type

        document_type => "_doc"

    }

    stdout {

        codec => rubydebug

    }

}

Copy the code

Now we can run Logstash to synchronize the data

Open the command line in the bin directory and type logstash -f yourconfig to run.

However, since I am running Windows, if I use the command line directly, I will get something like the following.

So I looked it up for a long time, and at one point tested it directly on my Linux server

Then I found another correct way to open it

Git Bash Here

Then enter the command shown below

Then we can see the SQL statement in the figure below, indicating that it is synchronizing data

Why don’t we open up the Header and see if the data has changed?

Knock knock knock!! We found that the item has now grown to 20 items (the first one was a full update, followed by an incremental one (●’◡’●)).

In order to verify that the subsequent incremental updates, I will directly write a new article, let you see the effect OwO

Because we just set the synchronization time in the configuration file to once per minute, so let’s wait a minute

(One minute later·····)

We found that the last recorded time in the SQL statement was the time of our first full synchronization (not when this article was created!).

After another minute, we find that the last time we recorded the synchronization again is the last time we recorded the synchronization (that is, the time we just created the article), but there is no new data, so the data is not synchronized).

At this point, we’ve completed a data synchronization (full and one item for increments (●’◡’●))

But there is a small regret oh, is to useLogstashThere is no way to synchronize the delete, so if the delete operation is involved, you need to manually delete it.

1.5 Implement highlighting search

With the data synchronized, it’s time to implement the core function of this article, highlighting search

Actually, I started with this featureSpringDataES 3.2I did, but when I was writing the article and looking up the information, I found that the official website has been upgraded to 4.0… So Woo found a lotAPIAll changed, just eat their own document with version 4.0 implementation.

1.5.1 Controller

Here are the parameters that the front-end needs to pass:

curPage— Current number of pages, default first page
size— Amount of data per page (⑦ by default
type— Time range of query (I define -1 for all, 1 for one day, 7 for one week, and 90 for three months)
keyword— Search keyword

/ * *

* Search for articles

* /

@GetMapping("/search")

public MyJsonResult searchArticles(

        @RequestParam(value = "curPage", defaultValue = "1") int curPage,

        @RequestParam(value = "size", defaultValue = "Seven") int size,

        @RequestParam(value = "type",defaultValue = "1") int type,

        @RequestParam(value = "keyword") String keyword) {



    List<Article> articles = articleService.searchMulWithHighLight(keyword,type, curPage, size);



    return MyJsonResult.success(articles);

}

Copy the code

1.5.2 Service

We conduct business operations at the Service layer.

First, based on the parameters passed from the front end, we need to complete pagination, time range, keyword highlighting, keyword search

Haha but don’t worry! ElasticSearch has all of these features!!

I’m going to list them all in one way, so it looks more comfortable. If you need to wrap it, you can do it yourself. The basic comments are complete

public List<Article> searchMulWithHighLight(String keyword, int type, int curPage, int pageSize) {



    // Highlight the color Settings (highlighting is simply wrapping the keyword around the span tag with color)

    String preTags = "<span style=\"color:#F56C6C\">";

    String postTags = "</span>";





    // Time range

    // Time processing in ES is very convenient

    // Now means the current time

    // now-1d/d is the previous day 00:00:00

    String from;

    String to = "now";

    switch (type) {

        case 1:

            from = "now-1d/d";

            break;

        case 7:

            from = "now-7d/d";

            break;

        case 90:

            from = "now-90d/d";

            break;

        default:

            from = "2020-01-01";

            break;

    }



    // Build the query criteria

    // 1. Search for related keywords in title and detail

    // 2. Time range lookup

    // 3

    // 4. Highlight the fields title and detail

    NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()

                .withQuery(QueryBuilders.boolQuery()// bool of ES

                           // must is equivalent to and in mysql

                        .must(QueryBuilders.multiMatchQuery(keyword, "title"."detail")) // Find the keyword in title and detail

                        .must(QueryBuilders.rangeQuery("createdTime").from(from).to(to))) // Perform range query according to creation time

                .withHighlightBuilder(new HighlightBuilder().field("title").field("detail").preTags(preTags).postTags(postTags)) / / highlight

                .withPageable(PageRequest.of(curPage - 1, pageSize))         // Set paging parameters, starting from 0 by default

                .build();





        // Perform a search to retrieve the results

        // SearchHits is a new class in SpringDataES 4.0, which contains highlights and other information such as score

        // Before 4.0, you need to manually write an entity mapping class for highlighting, which needs to be reflected.

        SearchHits<Article> contents = ESRestTemplate.search(searchQuery, Article.class);

        List<SearchHit<Article>> articles = contents.getSearchHits();

        // If the length of the list is 0, return

        if (articles.size() == 0) {

            return new ArrayList<>();

        }





        // Complete the actual mapping and get the data for the displayed article.

        List<Article> result = articles.stream().map(article -> {

            // Get the highlighted data

            Map<String, List<String>> highlightFields = article.getHighlightFields();



            // If the set is not empty, it contains highlighted fields

            // The highlighted result set is a List
       
        .
       

            // We don't need to send the whole detail to the front end. We only need to send a small part of the detail to the front end. After all, we only need part of the highlight.

            // article.getContent() this API returns the queried article entity class

            if(! CollectionUtils.isEmpty(highlightFields.get("title"))) {

                article.getContent().setTitle(highlightFields.get("title").get(0));

            }



            if(! CollectionUtils.isEmpty(highlightFields.get("detail"))) {

                article.getContent().setDetail(highlightFields.get("detail").get(0));

            }



            // Business logic operation

            / /......



            // Finally complete data encapsulation

            return articleDTO;

        }).collect(Collectors.toList());







        return result;

}

Copy the code

Here we put back – end interface implementation!!

Next up is an exciting item: hey, one item (●’◡’●). > Д <) o ゜)

1.6 Interface Test

We test directly using IDEA, typing keyword as Java

As a result, we can see that the Java keyword has been wrapped around span in the title and detail.

In this way, the front end gets the data can be highlighted normally!!

2. The front end

I’m using vUE on the front end,

I won’t spend a lot of time on the front end, because the implementation is simple.

You just need to usev-forandv-htmlOne item for a complete display (●’◡’●)

The current idea is to slide down to the bottom and display the next page of data, not yet (although it’s easy if you want to use a page bar)

Please go to 👉 Github for details.

3. Project source code

Front end: making

The backend: dead simple

One item for ⭐ Star: Thank you very much (●’◡’●)

4. Write at the end

One item one item: HHH (●’◡’)

While reviewing the code myself, I also hope to help you.

Feel free to point out in the comments if there’s anything wrong with it

If you think the article is good, please give a thumbs up

Click like 👍 not white whoring, start from me ha ha