Question and answer system: through a descriptive text given by the user, search for questions close to the user’s input through similarity calculation. Recommendation: When browsing the current article, the user recommends articles similar to this article based on content similarity
More_like_this is used to help you find more data like this document. To help you find more data like this document, you need to create an index library that contains the title and desc fields:
PUT /search_data
{
"mappings": {
"properties": {
"title": {
"type": "text"."term_vector": "yes"
},
"desc": {
"type": "text"}}}}Copy the code
- Term_vector If it is yes, term_vector will index terms vector, speeding up the calculation of similarity. Term_vector can be used to query more_like_this if term_vector is not configured. However, more_like_this can be used to query more_like_this if term_vector is not configured.
Make recommendations based on a short paragraph or a problem description statement
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["title"."desc"]."like" : "Qingming Festival spring outing spring tourism school spring outing parent-child outing enterprise outing"."min_term_freq" : 1,
"max_query_terms": 12}}}Copy the code
- Fields Indicates the field to be queried. Currently, only text and term are supported
- Like to query for similar text, either a document ID or a query term
- Min_term_freq Minimum word frequency. Words below this frequency will be ignored
- Max_query_terms According to max_query_terms, extract the largest tFIDF values of like in this term, and other terms will be ignored
In addition, if the text is too long, similar recommendations can be made based on the article Id
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["title"."desc"]."like": [{"_index" : "search_data"."_id" : "1"}]."min_term_freq" : 1,
"max_query_terms": 12}}}Copy the code
Like can be an array with multiple articles, and _index can also correspond to an index library that is not the current query.
Results the fine-tuning
- Unlike, if you are not satisfied with the recommendation result, you can also fine-tune the parameters by using the same method as like, but the difference is that some content you don’t like is passed in here, and the weight reduction is carried out during similarity calculation. It should be noted that the weight reduction is not obvious if the head recommendation is used.
GET search_data/_search
{
"size": 112,
"_source": ["desc"."title"]."query": {
"more_like_this" : {
"fields" : ["title"."desc"]."unlike":[
{
"_index" : "search_data"."_id" : "1270715"
},
{
"_index" : "search_data"."_id" : "1238991"
},
{
"_index" : "search_data"."_id" : "506680"
},
"I'm going to block things I don't like."]."like": [{"_index" : "search_data"."_id" : "986604"}]."min_term_freq": 1}}}Copy the code
Other parameters are optional
- Min_doc_freq: Minimum document frequency, default is 5.
- Max_doc_freq: maximum document frequency.
- Min_word_length: minimum length of a word.
- Max_word_length: the maximum length of a word.
- Stop_words: list of stop words.
- Analyzer: Word analyzer.
- Minimum_should_match: The minimum number of words the document should match. Default is 30% of the words after the query participle.
- Boost_terms: Weight of the term.
- Include: Whether to return the input document as a result.
- Boost: The weight of the entire Query, which defaults to 1.0.
Author: Yi Qixiu Engineer Yarn -> Personal home page