When using Elasticsearch for full-text search, the search results are sorted by document relevance by default. If you want to change the default sorting rules, you can specify one or more sorting fields using sort.

Color {red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}{color{red}} It simply ignores the relevance of the document itself. In many cases this does not work well, and you need to evaluate multiple fields to get a final ranking.

In Elasticsearch function_score is the DSL used to process document scores. It performs a series of re-scores on each matching document after the query ends, and finally sorts the resulting final score. It provides several default functions for calculating scores:

  • Weight: Sets the weight

  • Field_value_factor: Calculates the value of a field to get a score.

  • Random_score: Score 0 to 1 randomly

  • Attenuation function: also based on the value of a field as the standard, the closer the distance to a value, the higher the score

  • It also has a boost_mode attribute that specifies how the calculated score is merged with the original _score, with the following options:

    • multiply: Multiply the result by _score
    • sum: adds _score to the result
    • min: Takes the smaller value of the result and _score
    • max: Takes the greater value of the result and _score
    • replace: replaces the result with _score

weight

The easiest way to use weight is to set a number as the weight and multiply the score of the document by that weight.

The most useful use of this filter is with the filter, because the filter will only filter the documents that meet the criteria, rather than calculate the specific score of each document, so any document that meets the criteria has a score of 1, and the weight can be changed to whatever value you want.

field_value_factor

  • field: Specifies the field name
  • factor: Preprocesses the field value and multiplies it by the specified value (default: 1)
  • modifier: Processes field values
    • log: Calculate logarithms
    • log1p: First add field value +1, then calculate logarithm
    • log2p: First add field value +2, then calculate logarithm
    • square: Calculate the square
    • sqrt: Calculate the square root
    • reciprocal: Calculate the reciprocal

To take a simple example, suppose you have an index of items and expect items with higher sales to be ranked higher based on relevancy, the query DSL might look like this. _score = _score + log (1 + 0.1 * sales)

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "Umbrella"}},"field_value_factor": {
        "field": "sales"."modifier": "log1p"."factor": 0.1
      },
      "boost_mode": "sum"}}}Copy the code

The attenuation function

The Decay Function provides a more complex formula that describes a case where a field has an ideal value, and the more the actual value of the field deviates from that ideal value (either increasing or decreasing), the less expected it is. This function works well for numeric, date, and geolocative types; color{red}{for numeric, date, and geolocative types} for numeric, date, and geolocative types, consisting of the following properties:

  • The origin (origin): The most desirable value for this field, this value can be full score (1.0)
  • Offset: Values within the offset from the origin can also be given full marks
  • Attenuation scale: When the value exceeds the range from the origin to the offset, the fraction obtained begins to decay. The attenuation scale determines the decay speed of this fraction
  • Decay value: The accepted value of this field (default is 0.5), which is a cut-off point, depending on the mode of attenuation

Decay functions can also be specified in three different modes: Linear, exponential with base E, and Gauss, which have different decay curves:

Take a simple example. We hope that the rental location is near the coordinates of (40, 116). The distance within 5km is satisfactory, and the distance within 15km is acceptable.

{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "Apartment"}},"gauss": {
        "location": {
          "origin": { "lat": 40."lon": 116 },
          "offset": "5km"."scale": "10km"}},"boost_mode": "sum"}}}Copy the code

Use multiple functions at the same time

Specify multiple functions using the functions attribute. It’s an array, so the original function doesn’t need to change. You can also use score_mode to specify the combination between function scores, which is the same value as boost_mode mentioned earlier.

For example, in Dianping, the app hopes to recommend some good restaurants to users. The characteristics are as follows: the range should be within 5km of the current location, parking space is the most important, Wi-Fi is better, the restaurant score (1 to 5 points) is higher, and it is better to show different results to different users to increase randomness.

The maximum score for such a restaurant should be 2 points (parking) + 1 point (wifi) + 6 points (rating 5 * 1.2) + 1 point (random rating).

{
  "query": {
    "function_score": {
      "filter": {
        "geo_distance": {
          "distance": "5km"."location": {
            "lat": $lat,
            "lon": $lng
          }
        }
      },
      "functions": [{"filter": {
            "term": {
              "features": "wifi"}},"weight": 1
        },
        {
          "filter": {
            "term": {
              "features": "Parking space"}},"weight": 2
        },
        {
            "field_value_factor": {
               "field": "score"."factor": 1.2}}, {"random_score": {
            "seed": "$id"}}]."score_mode": "sum"."boost_mode": "multiply"}}}Copy the code