How to use Elasticsearch to solve real business scenarios

If you are still using SQL like % XXX % to perform full text search, you are likely to have a chat with the DBA or the boss. Mysql InnoDB engine for example, this method will perform very inefficient full text search, and will not use indexes.

Elasticsearch is called ES for short.

So, this paper will change the community search interface used in Flask framework from like % XXX % to ES to do the full text index, and observe whether there is a significant performance improvement (if not, whether there are other ways to improve), and first see how to quickly implement from 0 to 1.

In other articles about implementation, the document codes of relevant components or ES are directly pasted over. Personally, I think these are subject to upgrade and document change. In order to use the latest API and features, READERS are advised to spend a few minutes to read the external link document address in the article.

  • What is ES? How to play?
  • Regional problems, how to complete the word segmentation of Chinese articles?
  • Business realization
  • Deployment and monitoring of online environments (service stability)

What is ##ES? How to play? Brew install elasticsearch is a quick way to install elasticsearch on a MacOS After installing the browser-level control tool Sense according to the document, you can follow the tutorial and get to the topic. Some friends want to make fun of ES quick start is too simple, after all, it is practical oriented quick start, focus on the realization of ES and business combination.

If I read the Chinese introductory course written in the last paragraph, I will find that ES has provided the function of English search fuzzy matching. As for how ES is implemented, interested readers can consult further information. $ES_HOME/plugins/. $ES_HOME/plugins/. The github address of the IK plugin can run the test request to verify that the Chinese word segmentation query is valid. The IK plugin has two configurations on analyzer and Search_Analyzer:

  • Ik_max_word: the text will be split into the most fine-grained, for example, “The national anthem of the People’s Republic of China” will be split into “the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the People’s Republic of China, the Republic of China, and the guo Guo, the national anthem of the People’s Republic of China”, exhausting all possible combinations;
  • Ik_smart: Will do the coarsest split, such as “People’s Republic of China national anthem” to “People’s Republic of China national anthem”.

In this way, the core word segmentation problem is solved. Whether there will be performance problems in a large number of articles needs to be verified in the actual combat. The next thing we need to do is to import the previous articles into ES and establish documents for the new articles.

The Elasticsearch cluster can contain multiple indexes, each Index can contain multiple types, each Type can contain multiple documents, each Document can contain multiple fields. Here is a glossary analogy for MySQL and Elasticsearch to help you understand:

MySQL Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mappping
Index Everything Indexed by default
SQL Query DSL

Create Index: forumindex and type:post for community articles. Create Index: forumindex for community articles

ES provides a set of add, delete, modify and check APIS. We can use simplified API in SENSE template to test and verify the validity. However, in back-end development, there are generally packaged libraries that can be configured and used directly according to different languages and frameworks If you’re using another language or framework, go to Github and search for wrapped frameworks (remember to see if the versions match). Before we take the next step, we opened our thing to do, this paper import the original community type, according to the provided to the client and the background of the query function to join the corresponding matching field, new article need to add to the ES, edit articles need to be modified after the content of the corresponding documents in the ES, and after the article from the shelves need to remove this article from the ES. Adding, deleting and modifying these extra ES operations will consume extra time and may fail. In order not to affect the interface, the operation can be encapsulated into a queue for asynchronous execution to ensure consistent execution. The optimization of ES query performance and stability is considered after the basic functions and interfaces are tuned first.

After importing the original post data, execute the following fuzzy query statement on the Sense Console with the corresponding return:

GET /forum-index/post/_search
{
   "query": {
    "bool": {
      "should": [{"match": { "content":  "Client" }},
         { "match": { "title":  "Client" }},
        { "match": { "summary": "Client"}}]}}}Copy the code
{
  "took": 4."timed_out": false."_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0."failed": 0}."hits": {
    "total": 19,
    "max_score": 25.630516."hits": [{"_index": "forum-index"."_type": "post"."_id": "60000000012672"."_score": 25.630516."_source": {
          "user_id": 10144,
          "title": "Client Post test"."summary": "Client post test content"."content": ""3f2b7dd51ddcb7d66b22f0f06645f16e.png" src="http://img-cn-hangzhou.aliyuncs.com/nb-imgs/3f2b7dd51ddcb7d66b22f0f06645f16e.png"> 

"
""."last_modify": 1480665329, "id": 60000000012672}},...Copy the code

In json format,took,_score… And so on and so forth what does this mean? Please refer to the official instructions. Here is also a brief explanation:

  • Took — time in milliseconds for Elasticsearch to execute the search
  • Timed_out – tells us if the search timed out or not
  • _shards – searched us how many shards were searched, as well as a count of the successful/failed to search shards
  • Hits – the search results
  • Hits. Total – Total number of Documents matching our search Criteria
  • Hits. Hits – Actual array of search results (defaults to first 10 Documents)
  • hits.sort – sort key for results (missing if sorting by score)
  • hits._score and max_score – ignore these fields for now

Here you can see that in the local environment of 5000+ post document segmentation query, the consumption time is 4ms, which is a very good result. Using ES to realize complex query functions can effectively reduce the request pressure of the database. Next, we can focus on optimizing the use of ES and query. According to the current evaluation ES is capable of producing full-text search services.

Deployment and monitoring of online environments (service stability)

It is a pity that the author did not complete the last step due to work invocation, but we can give a general idea. After online deployment and configuration, service daemons such as Supervisor can be used to ensure the stability of the service and perform performance pressure test. In order to ensure the continuous availability of the service, it is suggested to deploy the cluster of ES. When a single node fails, online services continue.

ELK is a popular tool for ElasticSearch + Logstash + Kibana. This is the tip of the icebox for ElasticSearch and Kibana.