Background of 0.

The company’s current business system is biased towards the background system, which currently contains 500W+ data, supports various conditional queries in many lists, and contains a large number of fuzzy search conditions. Due to the low efficiency of fuzzy query in mysql, the company has used ES search engine for conditional search. The ES version is as follows:

Elasticsearch version: 6.3.2

Java Client version: REST-high-level Client 6.3.2

Problem: Business needs some Chinese fields to be sorted by A-Z pinyin.

1. Implementation scheme

Elasticsearch – Analysis-Pinyin word segmentation plugin allows you to sort elasticSearch and pinyin word segmentation. The specific principle is to divide Chinese characters into strings containing only the first letter through the pinyin word divider (for example: Andy Lau —-> LDH), and then sort by querying the word divider field.

2. Just do it

2.1 Download and install pinyin word divider

Elasticsearch – Analysis – Pinyin download url: github.com/medcl/elast…

Download the corresponding phonetic word segmentation version according to my ES version. Since mine is version 6.3.2, download the word segmentation version of master version.

Unzip the zip package, move the command line to the solution package, and run the MVN package command (no maven download) :

MVN package command: MVN package

If you see BUILD SUCCESS, the package is successful.

Go to E:\Elasticsearch\ Elasticsearch – analysis-Pinyin-master \ Target \ Releases the zip package and create the following files:

Copy these three files to the pinyin folder in the plugins directory of the ES installation directory (the pinyin folder needs to be created and can be named arbitrarily) :

Restart ES, the pinyin word divider is installed here.

2.2 Index Setting and Mapping Settings

Create index setting ();

Since we need to use both the Pinyin word splitter and the IK word splitter, we configured two of them when configuring the analyzer.

PUT /pinyinTestIndex
{
    "index" : {
        "analysis" : {                          
            "analyzer" : {                           
               "default"// The default tokenizer uses the IK tokenizer"tokenizer" : "ik_max_word"
               },
               "pinyin_analyzer": {// Custom pinyin word divider"tokenizer" : "my_pinyin"}},"tokenizer" : {
                "my_pinyin": {// Pinyin word segmentation configuration"type" : "pinyin"."keep_first_letter":true."keep_separate_first_letter" : false."keep_full_pinyin" : false."limit_first_letter_length": 20."lowercase" : true."keep_none_chinese":false
                 }
            }
        }
    }
}
Copy the code

There are several configurations in the pinyin segmentation configuration that determine whether the sorting can be done according to your requirements. Keep_first_letter: contains the first letter, for example, Andy Lau > LDH, default: true.

Keep_separate_first_letter: Separates letters, for example, Andy Lau > L, D, h, default: false.

Keep_full_pinyin: contains full pinyin, for example, Andy Lau > [liu, de, hua], default: true.

Limit_first_letter_length: Sets the maximum length of the first_letter result. Default: 16.

Lowercase: non-Chinese letters. Default: true.

Keep_none_chinese: does not keep non-Chinese letters or digits in the result. Default: true.

Therefore, my pinyin splitter will have the following effect — if the string is Andy Lau, it will become LDH, if the string is Andy Lau A, it will become LDHA, and if the string is Andy Lau 1, it will become LDH1. This word segmentation effect meets our business requirements, and of course there are other configurations available to meet different business requirements.

For other configurations, see ElasticSearch – Analysis-pinyinREADME.mdOption to select.

After that, set up the index mapping file and ensure that fields use pinyin word dividers:

POST /pinyinTestIndex/dev/_mapping 
{
    "dev": {
        "properties": {
            "name": {//name field"type": "text"// The string type supports segmentation"analyzer": "pinyin_analyzer"// Use pinyin participles"fields": {// contains another non-word effect"keyword": {                            
                        "type": "keyword"."ignore_above": 256}}}}}}Copy the code

At this point, the index is created.

2.3 Using Java Client to sort queries

 //search api
 SearchSourceBuilder source= new SearchSourceBuilder(); Sortorder.asc sortorder.desc source.sort(sortorder.asc sortorder.desc source.sort("name", SortOrder.ASC); SearchRequest SearchRequest = new SearchRequest("pinyinTestIndex");
 searchRequest.types("dev");
 searchRequest.source(source); // Query SearchResponse response = client.search(searchRequest);Copy the code

2.4 Viewing Results

Ascending effect:

Descending effect:

3. Summary

The pinyin sorting effect is mainly achieved by modifying the pinyin word segmentation plug-in configuration, according to individual different business needs, modify and add configuration options, to achieve different query results. Good luck.