Make writing a habit together! This is the fifth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

preface

On day 4 we looked at how to debug ES query using Kabana. Today we are going to explore common searches in emporium, using ES data based on ik participles

Data modeling

We need to confirm requirements before development. What data are you searching for? In addition to implementing search, what other issues should we pay attention to? Take Taobao for example, there are two main places when we search, 1 is to match words, 2 is to achieve sorting. In the following search results, we analyze the fields that need to be filtered and sorted during the search

  • Search field
  1. Specification (Specification query is complicated, we will not discuss how to implement for the moment)
  2. The name of the
  3. Key words or Introduction
  • Sort field
  1. If the advertisement is placed, the order is higher
  2. sales
  3. The price
  • other
  1. Paging search

Based on the above, we model the simple goods into the following fields: ID, name, keyWord, sellNum, price, sort We don’t have in-depth analysis here. In fact, there should be other information about the product, such as removal and removal, whether it is recommended or not

Initialize data

Our process for initializing the item index with Kabana is as follows

Create indexes

PUT goods
{
	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0
	}, 
        "mappings": {
	  "properties": {
	    "id":{
	      "type": "keyword"
	    },
	    "name":{
	      "type": "text",
	      "analyzer": "ik_max_word",
	      "search_analyzer": "ik_smart"
	    },
	    "keyWords":{
	      "type": "text",
	      "analyzer": "ik_max_word",
	      "search_analyzer": "ik_smart"
	    },
	    "sellNum":{
	      "type": "integer"
	    },
	    "price":{
	      "type": "double"
	    },
	    "sort":{
	      "type": "integer"
	    }
	  }
	}
}
Copy the code

The common ES data structure can be referred to in this blog post. Let me briefly introduce some of the points that need to be noted

  1. Partition and backup: In the production environment, multiple partitions and multiple backups are recommended to prevent data loss
  2. Why id is keyword: Since keyword is a keyword data type, fields of keyword type will not be parsed. For example, we now have one data with id 123456 and another with id 12345. If we set the index to text, If we want to delete 12345, we will also delete 123456 and other data containing 12345
  3. Analyzer \ search_Analyzer Word type selection: When we insert documents respectively, we do the smallest word segmentation for the text type field and then insert the inverted index. When we query, we do the coarsest word segmentation for the text type input to be queried first, and then search the inverted index. For example, if the insertion data is “iPhone “, we try to open” Apple “,” phone “, when we search for “Xiaomi phone “, we will not query the data of iPhone

Initialize data

We initialized 9 pieces of data to verify our query, including 3 ads, 3 Apple and 3 Huawei request bodies as shown below

PUT goods/_doc/100001 {" iD ":"100001", "keyWords": mobile smartphone 5G mobile Phone Pre-ordered mobile phone ", "sellNum":1, "price":12, "sort": 1} PUT goods/_doc/100002 {"id":"100002", "keyWords":" mobile phone smartphone 5G mobile phone search ", "sellNum":2, "price":11, "sort": 2} PUT goods/_doc/100003 {" ID ":"100003", "keyWords":" mobile phone smartphone 5G mobile phone Hot search pre-ordered mobile phone ", "sellNum":3, "price":10, "sort": 3} PUT goods/_doc/100004 {" iD ":"100004", "name":" iPhone 001", "keyWords":" iPhone smartphone IOS ", "sellNum":4, "price":9, "sort": 4} PUT goods/_doc/100005 {" ID ":"100005", "name":" iPhone 002", "keyWords":" iPhone smartphone IOS ", "sellNum":5, "price":8, "sort": 5} PUT goods/_doc/100006 {" iD ":"100006", "name":" iPhone 003", "keyWords":" iPhone smartphone IOS ", "sellNum":6, "price":7, "sort": 6} PUT goods/_doc/100007 {" ID ":"100007", "name":" Huawei 001", "keyWords":" mobile phone smartphone Kylin Domestic mobile phone ", "sellNum":7, "price":6, "sort": 7} PUT goods/_doc/100007 {" ID ":"100007", "name":" Huawei 001", "keyWords":" mobile phone smartphone Kylin Domestic mobile phone ", "sellNum":7, "price":6, "sort": 7} PUT goods/_doc/100009 {" iD ":"100009", "name":" iPhone 003", "keyWords":" iPhone smartphone ", "sellNum":9, "price":4, 7} PUT goods/_doc/100009 {" iD ":"100009", "name":" iPhone 003", "keyWords":" iPhone smartphone ":9, "price":4 "sort": 9 }Copy the code

Practical exploration of each scene

The following request bodies are executed in Kabana

The simplest search all, sorted by sales volume, paging query top 5, we passed in the incoming request body as follows

post goods/_search
{
  "query":{"match_all": {}},
  "from":0,
  "size":5,
  "sort":{
    "sort":"desc"
  }
}
Copy the code

Here are a few key attributes to explain

  • Query: Encapsulates properties of the query body, such as match, match all, and so on
  • From: where do you start? Notice that the starting position is 0
  • Size: number of items per page
  • Sort: the outer sort stands for sort, and the inner sort specifies that the sort field is sort and sorted by desc

By looking at the response, we can see that all the total is 9HITS and it shows 5, which proves that we have achieved paging effect

Enter mobile phone and default query: default sort, match name or keyword

For example, if we type in a smartphone, we should find all the data (matching keyWords) and the first few should be AD space if sorted by default

POST goods / _search {" query ": {" bool" : {" should ": [{" match" : {" name ":" smart phones "}}, {" match ": {" keyWords" : "Smartphones"}}}}], "from" : 0, "size" : 5, "sort" : {" sort ":" asc "}}Copy the code

So you can see that there are a couple more tags here, so LET me just say a little bit about bool: bool means filter query, and then there’s a bunch of properties that are combined with bool, so you can go to es and look for a wave of should: means should, For example, in our case, when we searched the smartphone, should matched the keyword attribute: it means match

I want to buy a Kirin system phone and search in reverse order by price or sales

We should return three pieces of data, so I’m going to set the page to two, and we’re going to start with number two

  • Query kirin products in reverse order by price
Post goods / _search {" query ": {" bool" : {" should ": [{" match" : {" name ":" kylin "}}, {" match ": {" keyWords" : "Kylin"}}}}], "from" : 2, the "size" : 2, "sort" : {" price ":" desc "}}Copy the code
  • Query kirin products in reverse order of sales
Post goods / _search {" query ": {" bool" : {" should ": [{" match" : {" name ":" kylin "}}, {" match ": {" keyWords" : "Kylin"}}}}], "from" : 2, the "size" : 2, "sort" : {" sellNum ":" desc "}}Copy the code

conclusion

In this paper, we analyze the simple ES search implementation based on IK word segmentation. The next article we discuss how to use Spring Boot to achieve our today’s needs of today’s next article: based on Spring Boot to achieve electricity market landscape IK word segmentation search, please look forward to!