Elasticsearch tutorial live replay
1. Actual combat problems
Q: How do you search for the best results?
My side of a search function, the implementation method is to use THE IK word segmentation with multi query implementation.
A dictionary of words related to the customer’s field was added along the way.
But customers keep reporting that the search experience is not good.
What else can you do to improve your search experience?
From: Dead Hit Elasticsearch knowledge planet
This is a very typical problem that I have encountered in actual product development.
2. Search experience from a few examples
Example 1: Screenshot of the search for Trigger in MOX Net.
Note: I typed “trigger”, the first result returned is ok, the other several: “touch”, “send”, can be said to have nothing to do with my search.
From the perspective of user experience, I think: the experience is poor and returns a lot of irrelevant data.
Example 2: a question bank APP does not support page-turning.
As shown below, there are 1703 questions in the question bank, including: true and multiple choice questions.
Only: Click: Previous question, next question.
Actual scenario:
- When 100, 200, only multiple choice questions; How many multiple choice questions?
- When you quit, you need to click a few hundred times to enter the last question you did…..
It’s not bad user experience, it’s no user experience, the developers didn’t think through the design at all, the users will “doubt life”.
Example 3: E-commerce search for “first long Johns of autumn”, what should BE returned?
Zoom in to view the image, the highlights appear
This is a matter of opinion, and each e-commerce company has its own judgment on what to return.
However, simply stand in the user’s point of view, judge.
A comment from Ming Yi:
- A lot of spelling
“It serves you right that you are developing fast” did return the expected result, and kindly recommended the information of “long Johns” in the region.
- taobao
At least they can return to long Johns.
- jingdong
Did not find the commodity, recommend “long Johns” for you, “why recommend, return directly not over”.
- dangdang
Boy! The recommendations are for “fall” items. You’re a user. What do you think?
- “What the hell?”
- “Mixed emotions”
- “Unintelligible”
.
Basically, the bottom line is that companies grow faster and search experiences grow faster.
Where there is data, there is search
With information flooding and exploding today, search is everywhere. The basic image can be summed up as: “where there is data, there is search”.
Search may be one of the most commonly used functions of users, learning, work, clothing, housing and transportation are inseparable from search.
- learning
Enter keywords to search for reliable free or paid web resources.
- work
When you encounter an error code, do a Google search for the answer.
Search wechat chat history to see some key valuable information.
- clothing
Buying clothes online is actually a process of searching and choosing.
- food
Daily ordering takeout at noon, the process of choosing takeout is the process of searching, close to the company + high evaluation = high probability of ordering.
- live
Book hotels for business trips, search, compare and choose a cost-effective one.
- line
11. Self-driving tour: Autonavi navigation before travel, input the destination search results, and select the appropriate route according to the returned results.
As the analysis of the Search experience points out, “The design and usability of the search box is an issue that cannot be ignored.
A good search experience may not make users feel good about your product, but a bad search experience can be fatal.
Therefore, whether in order to provide users with better services, or to avoid negative user experience, a good search experience for a content-oriented product is crucial. “
It is the minimum threshold to judge whether the search experience is good or not, and the search results meet the needs of users. The following contents are the research points and user concerns that can bring good search experience:
- Search:
1) Visually highlight the search box and use the search box with the magnifying glass icon;
2) Place the search box in the expected position;
3) Provide search button;
4) The right size
Unkindly said: “in the navigation bar in the most prominent position of the search box is the user’s minimum respect”!
- Searchable Content Tip: Tells users what they can search for
- Every page should have a search box
- Use intelligent recommendation/matching mechanisms
Intelligent recommendation or matching can save input costs for users.
Average users are not very good at organizing search language: in this case, if they don’t articulate the question in the first step, they will have a hard time succeeding in finding the right search results.
When smart matching works, it can help users express their search questions clearly and find satisfactory answers.
In short, a good search experience is a good user experience, and a good user experience is naturally linked to retention and even company growth.
4. Disassemble the five core links of user search
“Search is like a conversation between the user and the App or website, where the user asks for information and the App or website responds by showing results.
Users expect a smooth search experience, and users often form quick judgments about the value of an App based on the quality of the search results.”
In the process of search, the user’s experience can be roughly divided into five parts: discovering the search, entering keywords, waiting for the results, viewing the results, and completing the search. The experience of each step is part of the overall experience and will affect the user’s final search experience.
4.1 Discovery and Search
As mentioned earlier, the search box should be eye-catching, even independent of the header, and should occupy a focal position in the UI so that the user can easily find it.
4.2 Entering Keywords
- You want to be able to prompt the user what keywords to type.
- The ability to provide “search tips” based on certain key points entered by the user, such as the screenshot of the previous Google search.
- Complex combination search, like Google Advanced Search, with auxiliary controls to filter dates, exclude keyword Settings, sort methods, and/or non-expressions, etc.
4.3 Waiting for results
- To respond quickly, the user’s patience is limited, more than 3 seconds do not return, estimated the loss of users.
- If the response is really slow, you can have a response animation or prompt message friendly prompt.
- Can identify user input, necessary results user history search habits, after integration to return the optimal TOP N results.
4.4 Viewing Results
- The process by which users are returned based on a search.
- If there is no result, you are not advised to return 0. You can provide other recommended information, such as prompting the user to change the keyword.
4.5 Completing the Search
- If there are results that meet the requirements, the search ends.
- Without satisfying the results of users, users will continue to search for keywords, or users will lose to other apps or websites.
To improve the user experience, all of these steps are necessary.
5, Elasticsearch search logic
Elasticsearch searches can be understood by following two procedures.
The following is only for: text The text type of full-text retrieval.
5.1 Write Indexing Process
- Elasticsearch does not write directly to the document, but builds an inverted index based on the segmentation defined by your Mapping (default: standard).
- The selection of word segmentation determines the granularity of word segmentation, and the granularity of word segmentation determines whether the follow-up index can reach the standard.
5.2 Data retrieval process
- Retrieval link, not what input to retrieve what, but different retrieval statements, there will be different retrieval mechanism.
- Search link, what type of search to choose, the results will be completely different.
For example, a fine-grained search for “match” and a coarse-grained phrase match for “match_phrase” will result in very different results.
Match: Will first slice the keywords you type and then retrieve them.
Match_phrase: will retrieve the word you type as a phrase.
6. Quantifiable metrics for Elasticsearch search experience
User experience is a sensory response, but sensory search results need to be quantified.
How do you quantify it? The actual essential index is: accuracy rate (accuracy rate), recall rate (recall rate).
6.1 recall rate
Definition: The ratio of related documents contained in the search results to all related documents in the entire collection.
Measure recall of search results.
6.2 accurate rate
Definition: The proportion of relevant documents in the search results.
Measure the accuracy of search results.
It can be understood in terms of confusion matrix,
related | Not related to | |
---|---|---|
return | Real Cases (TP) | Pseudopositive example (FP) |
Did not return | Pseudo counter Example (FN) | True counter example (TN) |
Given the above matrix, the accuracy and recall rate can be calculated as follows:
Var2: = ref (var2, 1) and close > = ref (close, 2);
Precision: = tp/(TP + fp) * 100%
If you still don’t understand, the popular explanation on Zhihu is:
- Recall rate: How many positive samples were recalled (how many were recalled).
- Accuracy: How many guesses are correct (how accurately) of the sample you think is positive.
How to improve Elasticsearch search experience
As mentioned earlier, the search five links are linked together. Search experience is a matter of design, front end, back end, decision level, and management. It cannot be simply understood as a technical issue.
Elasticsearch backend technology
7.1 Select an appropriate word segmentation based on service scenarios
Note that there is no best tokenizer, no universal tokenizer for all business scenarios, and you need to choose the best tokenizer based on business scenarios.
- If fine granularity is required, recall as long as it exists, then ngram segmentation is suitable or 7.9+ new wildcard data type is preferred.
- It is necessary to make a comparison of tangent words in advance to verify whether different word segmentation can meet the business. English choice: IK, stutter, ANSJ or others.
Cut word comparison core API: Analyzer to live learn to use.
POST _analyze
{
"text":"Providing the world's leading Cloud Computing services _ Helping Enterprises to get on the Cloud without worry"."analyzer": "ik_smart"
}
Copy the code
- Select IK, to distinguish: IK_SMAR and IK_MAX_word.
Ik_smart is a coarse-grained participle (returns as little as possible, approximates the human worker participle);
Ik_max_word is a fine-grained participle (return as many as possible).
7.2 Pay attention to the selection and updating of dictionaries
“One cannot make bricks without straw”, “clever woman” is a participle, the dictionary is “rice”.
No matter how awesome the segmenter is, it is useless without a reliable dictionary.
Therefore, the dictionary choice is good, the segmentation will be more accurate.
Suggestion: When the basic thesaurus is relatively complete, add your own industry thesaurus and domain thesaurus based on business scenarios.
Even if the industry and field dictionaries are added, how to cover not entirely new words?
For example: new network vocabulary, industry vocabulary can not be comprehensive, resulting in incorrect word segmentation, poor user experience how to do?
As a plug-in, the original dictionary does not support dynamic update once configured, so it needs to be implemented by a third-party mechanism.
For example: IK dictionary dynamic update implementation mechanism: combined with modify IK word segmentation source + dynamic update mysql entries to update the dictionary.
7.3 Attach importance to data modeling in Mapping
- Fielddata of text type is a big memory hog and is not recommended unless you have to.
- Whether the keyword type is enabled depends on whether sorting or aggregation is required.
- For fields that do not need indexes, set index to false.
- For fields that do not need to be stored, set Store to False.
- Large text such as Word and PDF text information, consider cutting into small pieces and storing them.
7.4 Select an appropriate search type based on your service scenario
As mentioned above, match and match_PHRASE are applicable to different scenarios.
- Match deals with: high recall rate, high recall rate but low accuracy rate.
- Match_phrase: matches a phrase and has a high precision and low recall rate.
- Wildcard fuzzy matching is not recommended unless necessary.
Of course, there are other retrieval types, such as Query_string, fuzzy, etc., that need to be selected in conjunction with the business scenario.
7.5 Trade-offs must be made in pursuit of optimal response speed
The user’s patience is very limited, do not make the user wait.
- Increase the ratio of data node memory to heap memory
- The _source field is not returned unless necessary
- Do not do complex business processing in the retrieval return phase
Including but not limited to:
1) Double or more polymerization
2) Wildcard or Regex regular retrieval
3) Custom highlighting
- Highlight to make the selection according to the type of business
Note: FVH highlighting is especially suitable for files >1MB(large files).
- Make business choices
For example: default from, size deep page 10000 is enough, if the product manager does not agree, need to discuss and convince them.
For example, inaccurate aggregate results are the default mechanism for Elasticsearch, so accept or make a different schema selection (like ClickHouse) and don’t worry about details.
7.6 Using the Intelligent Recommendation and matching mechanism
- Simple search box recommendations can be implemented with the help of: prefix prefix search implementation.
GET kibana_sample_data_ecommerce/_search
{
"_source": "customer_full_name"."query": {
"prefix": {
"customer_full_name.keyword": "Ed"}}}Copy the code
- Suggester is used to implement a recommendation for a complex point that requires error correction.
POST /blogs/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne rock"."term": {
"suggest_mode": "missing"."field": "body"}}}}Copy the code
Suggester has been Suggester.
Recommended: Uncle Wood’s article:
elasticsearch.cn/article/142
- More complex, need user behavior recognition + recommendation engine mechanism to achieve.
A good recommendation engine tend to personalized recommendation, it can collect user valuable digital footprint (such as demographic, transaction details, interactive log, buy records, trading records, browse records) and information about the products (such as: specifications, user feedback, compared with other products, etc.), to complete the recommended before data analysis.
8, summary
The search experience determines the user experience, and the user experience determines the user rate of the product, which in turn determines the success of the product.
Liang Ning, a famous product expert, mentioned in Lecture 30 of Product Thinking that “We see many new Internet companies with inferior system capability to traditional enterprises, but they can snatch a large number of users from traditional enterprises, relying on user experience. In the case of such a large volume difference, user experience can become the core competitiveness; When competing in the same dimension, user experience is the most core competitiveness.
Search is the entrance of traffic, is the “war” (various apps, websites) user experience to contend for.
There is no end to the iteration of the search experience, and you can’t be too thorough or careful.
If you have good ideas and suggestions, you are welcome to exchange them.
Reference:
-
www.woshipm.com/ucd/1037490…
-
zhuanlan.zhihu.com/p/60826371
-
www.jianshu.com/p/677742838…
-
www.chanpin100.com/article/103…
-
www.uisdc.com/search-expe…
-
www.oreilly.com.cn/radar/?p=28
-
Do-it-yourself Recommendation Engine
Recommendation:
Commonly used dry | Elasticsearch development of actual combat command list
Dry goods | Elasticsearch developers best practice guide
Elasticsearch development operational combat Tips
The importance of the theory of dry goods | Elasticsearch data modeling
Dry goods | Elasticsearch index design practical guide
Dry goods | Elasticsearch multi-table associated design guidelines
Learn more in less time, faster!
40%+ Elastic certified engineers in China are here!