In the previous article “Elasticsearch: An Introduction to Elasticsearch SQL and examples”, we briefly introduced the new Elasticsearch SQL functionality and the _translate API. This particular article continues the series by exploring more complex features. If you don’t have your own data ready, read the article I pointed out earlier.

 

Complex examples and benefits of Elasticsearch

Grouping

Elasticsearch’s aggregation framework (able to aggregate billions of data points) represents one of the strongest and most popular features in the stack. From a functional point of view, it has a natural equivalence to the GROUP BY operator in SQL. In addition to providing some examples of GROUP BY functionality, we will again use the Translation API to show the equivalent aggregation.

“Find the average flight time to London for each country of origin destination. In alphabetical order by country.”

sql> SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY OriginCountry LIMIT 5; Avg_Flight_Time | OriginCountry -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 9.342180244924574 13.49582274385201 | | AE AR 4.704097126921018 | AT | | AU 7.998943401875511 15.081367354940724 CACopy the code

Checking the DSL for this query will show the use of “composite aggregation”.

GET flights/_search
{
 "size": 0,
  "_source": false,
  "stored_fields": "_none_",
  "aggs": {
    "groupby": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "3471": {
              "terms": {
                "field": "OriginCountry.keyword",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggs": {
        "3485": {
          "avg": {
            "field": "FlightTimeHour"
          }
        }
      }
    }
  }
}
Copy the code

This is composite aggregation. It can help me implement scroll function in aggregration. If you don’t understand this, please see my other article “Composite Aggregation in Elasticsearch”. The result of the above query returns:

{ "took" : 21, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 10000, "relation" : "gte" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "groupby" : { "after_key" : { "3471" : "ZA" }, "buckets" : [ { "key" : { "3471" : "AE" }, "doc_count" : 385, "3485" : {" value ": 9.342180244924574}}, {" key" : {" 3471 ":" AR "}, "doc_count:" 258, "3485" : {" value ": 13.49582274385201}}, {" key ": {" 3471" : "AT"}, "doc_count:" 120, "3485" : {" value ": 4.704097126921018}}, {"key" : {"3471" : "AU"}, "doc_count" : 518, "3485" : {"value" : 15.081367354940724}},...Copy the code

We can also use functions to group alias fields defined in select.

“Find the number of flights per month and the average flight time.”

POST /_sql? format=txt { "query":"SELECT COUNT(*), MONTH_OF_YEAR(timestamp) AS month_of_year, AVG(FlightTimeHour) AS Avg_Flight_Time FROM flights GROUP BY month_of_year" }Copy the code

The result of the query above is:

COUNT (*) | month_of_year | Avg_Flight_Time -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 5687 | | 8.578573065474027 4 7372 | | 8.472684454688286Copy the code

The use of Composite aggregation has one major advantage – it ensures that the GROUP BY implementation is extensible even for high-cardinality fields, and provides a mechanism to stream all buckets for a particular aggregation, similar to scrolling through documents. This also ensures that the implementation does not suffer from the same memory limitations as using the term aggregation. We can translate the corresponding composite aggregation by using the following command:

POST /_sql/translate
{
  "query":"SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY Avg_Flight_Time"
}
Copy the code

The corresponding translation results are:

{
  "size" : 0,
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "bee1e422" : {
              "terms" : {
                "field" : "OriginCountry.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      },
      "aggregations" : {
        "803ccc93" : {
          "avg" : {
            "field" : "FlightTimeHour"
          }
        }
      }
    }
  }
}
Copy the code

Filtering Groups

To filter groups, we can use the HAVING operator, which can also use an alias specified in the SELECT clause. This may be unusual for some SQL specialists because it is usually impossible in an RDBMS based implementation because SELECT is executed after HAVING. In this case, the HAVING clause uses an alias declared at execution time. However, our parser was smart enough to look ahead and choose the declarations to use in HAVING.

“Find the number of flights per departure city, the average distance traveled and the 95th percentile, with the average distance between 3,000 and 4,000 miles.”

sql> SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance  BETWEEN 3000 AND 4000; OriginCityName | avg_distance | c |percentile_distance -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Verona | | 120 | 7927.0 3078.0 Vienna | | 3596.0 120 | 7436.0 Xi 'an | | 114 | 7964.0 3842.0Copy the code

To implement the HAVING function, SQL Elasticsearch uses Bucket Selector pipe aggregation, using parameterized Painless scripts to filter values. Note below that the keyword variant of the OriginCityName field is automatically selected for aggregation, rather than trying to use the standard text variant, which may fail because field data is not enabled. Avg and Percentile metric aggregations provide functionality equivalent to SQL variants.

POST /_sql/translate { "query": """ SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance  BETWEEN 3000 AND 4000 """ }Copy the code

The above translation result is:

{ "size" : 0, "_source" : false, "stored_fields" : "_none_", "aggregations" : { "groupby" : { "composite" : { "size" : 1000, "sources" : [ { "ff6ca116" : { "terms" : { "field" : "OriginCityName.keyword", "missing_bucket" : true, "order" : "asc" } } } ] }, "aggregations" : { "b54e054" : { "avg" : { "field" : "DistanceKilometers" } }, "7171c519" : {"percentiles" : {"field" : "DistanceKilometers", "Percents" : [95.0], "keyed" : true, "tdigest" : {" bucketS_path ": {" A0" : "b54e054", "A1" : "b54e054" }, "script" : { "source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.and(InternalSqlScriptUtils.gte(InternalSqlScriptUtils.roun d(params.a0,params.v0), params.v1), InternalSqlScriptUtils.lte(InternalSqlScriptUtils.round(params.a1,params.v2), params.v3)))", "lang" : "painless", "params" : { "v0" : null, "v1" : 3000, "v2" : null, "v3" : 4000 } }, "gap_policy" : "skip" } } } } } }Copy the code

Literal operators and dependencies

One of the unique features of Elasticsearch as a search engine compared to a traditional RDBMS is its ability to score matches by taking into account properties of text data using correlation calculations, rather than simply “yes/no”. Extending the SQL syntax allows us to expose this functionality beyond what a traditional RDBMS might offer.

Therefore, we introduced two new operators: QUERY and MATCH. For those familiar with Elasticsearch, these are equivalent to the basic multi_match and query_string operators. Users of Kibana will be familiar with the behavior of the Query_string operator because it is used to power the default search bar. It provides intelligent parsing capabilities and allows natural language-style queries. The details of these two operators are beyond the scope of this blog, but the authoritative guide entries provide a good introduction to these concepts.

For example, consider the following:

“Find all delayed flights to and from Kastrup Airport sorted by date between June 06, 2018 and June 17, 2018.”

Edmonton International Airport is an International Airport serving Edmonton, Alberta, Canada and the surrounding area. Using the QUERY operator, we simply search for Edmonton.

sql> SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp; timestamp | FlightNum |OriginCityName | DestCityName -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the 2020-04-14 T22: this. The 000 z | 1 c0zwe9 | says Dr. | aboard the 2020-04-16 T04:55:07. 48 DVRFT 000 z | | aboard | Torino 2020-04-16 T19: immediately. The 000 z | 14 KTFQB | aboard | Oslo The 2020-04-19 T06: therefore. 000 z | EN9FHUD | Detroit | aboard the T20:2020-04-21 and. 000 z | H5Y0MJK | aboard | Palermo T02:2020-04-23 from the 000 z | KCNMKVI | aboard | at Erie T09:2020-04-23 34:02. 000 z | XH9H5H3 | Paris | aboard The T04:2020-04-25 ". 000 z | GJTJ47T | aboard | Bangalore T13:2020-04-26 23:09. 000 z | PPZN0Y7 | aboard | of Indianapolis The 2020-04-27 T00:20:57. 000 z | IKFEGFL | aboard | Warsaw T22:2020-04-27. 11:51 000 z | 300 JHDQ | Green Bay | aboard The 2020-04-30 T15:02:33. 000 z | PK1ETRA | Rome | aboard the 2020-05-01 T17:52:50. 000 z | A2NRDPQ | aboard | Manchester The 2020-05-01 T22: cometh. 000 z | S9AY152 | aboard | Buenos Aires T15:2020-05-03 52:05. 000 z | PJXXO9P | aboard | Buenos Aires The 2020-05-05 T09:00:47. 000 z | QTPABGR | aboard | the Jeju City 2020-05-05 T18:49:49. 000 z | YVEUZNO | aboard | Ottawa The 2020-05-06 T12: upon. 000 z | TCPDEBY | aboard | Bergamo T00:2020-05-07 00:00. 000 z | SW1HB5M | Abu Dhabi | aboard The 2020-05-07 T12: so. 000 z | 0 hz3phm | Cape Town | aboard the T15:2020-05-08 and. 000 z | T5YFSWW | Paris | aboard The T16:2020-05-08 and. 000 z | E92FNK2 | aboard | Vienna T02:2020-05-09 34:40. 000 z | PB8BSSH | aboard | Tokyo The 2020-05-10 T14:06:58. 000 z | ADWMNQL | aboard | Zurich T15:2020-05-11 skilful. 000 z | YB4FNOI | aboard | Vienna The 2020-05-12 T22:16:10. 000 z | TCE99LO | Copenhagen | aboard the 2020-05-14 T00:19:45. | 000 z RBJT1ZG | aboard | Palermo The T12:2020-05-15 35:39. 000 z | M1NHZTB | aboard | through the 2020-05-17 T15:23:49. 000 z | WC862JS | Dublin | aboard The 2020-05-18 T19:39:08. 000 z | 99 r1vxk | aboard | Naples T05:2020-05-21 divers. 000 z | PJP5R9L | aboard | Portland The 2020-05-21 T07:59:04. 000 z | PK7R8IF | aboard | Winnipeg T00:2020-05-22 00:00. 000 z | RLMOSMO | aboard | Rome T17:2020-05-22 10:22. 000 z | K0SUJFG | Tokoname | aboard the 2020-05-22 T19:06:34. 000 z | ECEIAND | aboard | Treviso The 2020-05-23 T01:20:52. 000 z | VG2K3M9 | level | aboard the 2020-05-23 T22:34:45. 000 z | 8 fxirfy | aboard | MiamiCopy the code

Note that there is no requirement to specify this field. A simple search for “Edmonton” using the QUERY operator is sufficient. Also, please note that our flights to and from Kastrup have been delayed. Elasticsearch queries are available here:

POST /_sql/translate
{
  "query": """
    SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp
   """
}
Copy the code
{ "size" : 1000, "query" : { "bool" : { "must" : [ { "bool" : { "must" : [ { "query_string" : { "query" : "Edmonton", "fields" : [ ], "type" : "best_fields", "default_operator" : "or", "max_determinized_states" : 10000, "enable_position_increments" : true, "fuzziness" : "AUTO", "fuzzy_prefix_length" : 0, "fuzzy_max_expansions" : 50, "phrase_slop" : 0, "escape" : false, "auto_generate_synonyms_phrase_query" : true, "fuzzy_transpositions" : Adjust_pure_negative ": adjust_pure_negative" : adjust_pure_negative" : adjust_pure_negative" : adjust_pure_negative True, "boost" : 1.0}}, {" range ": {" timestamp" : {" from ":" 2018-06-20 ", "to" : "2020-06-27", "include_lower" : Adjust_pure_negative ": true, adjust_pure_negative" : true," Adjust_source ": adjust_pure_negative" : true," Adjust_pure_negative ": true," Adjust_source ": adjust_pure_negative" : true { "includes" : [ "FlightNum", "OriginCityName", "DestCityName" ], "excludes" : [ ] }, "docvalue_fields" : [ { "field" : "timestamp", "format" : "epoch_millis" } ], "sort" : [ { "timestamp" : { "order" : "asc", "missing" : "_last", "unmapped_type" : "date" } } ] }Copy the code

This represents a relatively complex query for new users of Elasticsearch. We have a Boolean query with nested ranges, term constraints, and query string operators. This can traditionally be a fairly daunting task for users migrating applications from SQL, even before they worry about whether the final query is functionally correct and optimal. The actual Query_string operator is nested in the filter because no correlation is required (we sort by date), allowing us to take advantage of the filter cache, skip scores and shorten response times.

The parameters of these operators are also exposed in SQL. The final example shows how a MATCH query can be used with multiple search terms across multiple fields to limit results.

“Find a sunny flight to and from Barcelona”

For example purposes, we also use the Score() function to sort and display the correlation Score.

sql> SELECT Score(), timestamp, FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM flights WHERE MATCH('*Weather,*City*', 'Lightning Barcelona', 'type=cross_fields; operator=AND') ORDER BY Score() DESC LIMIT 5; Score() | timestamp | FlightNum |OriginCityName | DestCityName | DestWeather | OriginWeather ---------------+------------------------+---------------+---------------+---------------+---------------+--------------- - | 2020-04-16 T06 6.917009:00:41. 000 z | L637ISB | Barcelona | Santiago | Rain | Thunder & from 6.917009 | | 2020-04-16 T01:58:51. 000 z ZTOD7RQ | Barcelona | Dubai | Sunny | Thunder & from 6.917009 | | the T14:2020-04-22 02:34. 000 z QSQA5CT | Barcelona | Naples | Rain | Thunder & from 6.917009 | the T12:2020-04-29 declared. 000 z | 0 gihb62 | Barcelona | Buenos Aires | Clear | Thunder & from 6.917009 | | 2020-04-30 T07: well. 000 z L09W9TV | Barcelona | Dubai | Cloudy | Thunder & 20Copy the code

We use the wildcard pattern to specify fields to match AND require a Boolean AND match. Cross-field parameters do not require the terms to all appear in one field, but allow them to appear in different fields, provided both fields exist. Given the structure of the data, this is crucial for matching.

The example here returns columns and groups. However, the QUERY and MATCH operators can also be used in conjunction with GROUP BY – effectively filtering aggregates into Elasticsearch.

Cross-index search and aliases

So far, our queries have been for a single table/index. If we copy the flights index and copy the document to the new named version through the REindex request, both indexes can be queried simultaneously as long as they have the same mapping. Any differences in the map can cause the query to fail when parsed. To query multiple indexes together, the user can add them to the Elasticsearch alias or use wildcards in the WHERE clause. If you remember, from the last article “Elasticsearch: SQL > select kibanA_sample_data_flights from kibanA_sample_data_flights from flight1; Now we can also copy this index into index flight2 by doing the following.

POST _reindex
{
  "source": {
    "index": "flight1"
  },
  "dest": {
    "index": "flight2"
  }
}
Copy the code

We can set flight1 and flight2 to f_alias as follows:

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "flight1",
        "alias": "f_alias"
      }
    },
    {
      "add": {
        "index": "flight2",
        "alias": "f_alias"
      }
    }
  ]
}
Copy the code

So we can use the following method to query:

sql> SELECT FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM f_alias ORDER BY timestamp DESC LIMIT 2;
   FlightNum   |OriginCityName | DestCityName  |  DestWeather  | OriginWeather 
---------------+---------------+---------------+---------------+---------------
GDZWNB0        |London         |Shanghai       |Rain           |Clear          
GDZWNB0        |London         |Shanghai       |Rain           |Clear  
Copy the code

JOINs

Joins in traditional RDBMS SQL implementations allow the merging of different tables by related columns in a single table response. This allows relational modeling of data compared to the options available locally for Elasticsearch, and represents an important topic. Although Elasticsearch SQL does not currently support the JOIN operator, it does allow users to leverage nested documents that provide simple one-to-many relationship modeling. Queries for nested documents are transparent to the user. To demonstrate this functionality, we need an index that contains such data. The documents for this index represent orders for e-commerce sites and contain fields such as ORDER_DATE, billing_city, and CUSTOMer_LAST_name. In addition, the “Products” field contains nested subdocuments for each product in the order. In order to load this document, we will install the Elasticsearch SQL and eCommerce data as described in the previous article.

Once the data is loaded, we can find an index in Kibana called KibanA_sample_datA_ecommerce. An example of its documentation:

{ "category" : [ "Men's Clothing" ], "currency" : "EUR", "customer_first_name" : "Eddie", "customer_full_name" : "Eddie Underwood", "customer_gender" : "MALE", "customer_id" : 38, "customer_last_name" : "Underwood", "customer_phone" : "", "day_of_week" : "Monday", "day_of_week_i" : 0, "email" : "[email protected]", "manufacturer" : [ "Elitelligence", "Oceanavigations" ], "order_date" : "2020-05-04T09:28:48+00:00", "order_ID" : 584677, "Products" : [{"base_price" : 11.99, "discount_percentage" : 0, "quantity" : 1, "manufacturer" : "Elitelligence", "tax_amount" : 0, "product_id" : 6283, "category" : "Men's Clothing", "SKu" : "ZO0549605496", "taxLESS_price" : 11.99, "unit_discount_amount" : 0, "min_price" : 6.35, "_id" : "sold_product_584677_6283", "discount_amount" : 0, "created_on" : "2016-12-26T09:28:48+00:00", "product_name" : "Basic T-shirt - dark blue/white", "price" : 11.99, "taxful_price" : 11.99, "base_unit_price" : 11.99}, {"base_price" : 24.99, "discount_percentage" : 0, "quantity" : 1, "manufacturer" : "Oceanavigations", "tax_amount" : 0, "product_id" : 19400, "category" : "Men's Clothing", "sku" : "ZO0299602996", "taxLESS_price" : 24.99, "unit_discount_amount" : 0, "min_price" : 11.75, "_id" : "sold_product_584677_19400", "discount_amount" : 0, "created_on" : "2016-12-26T09:28:48+00:00", "product_name" : Sweatshirt-grey multicolor", "price" : 24.99, "taxful_price" : 24.99, "base_unit_price" : 24.99}], "sku" : ["ZO0549605496", "ZO0299602996"], "taxFUL_total_price" : 36.98, "taxLESS_total_price" : 36.98, "total_quantity" : 2, "total_unique_products" : 2, "type" : "order", "user" : "eddie", "geoip" : { "country_iso_code" : "EG", "location" : {" LON ": 31.3," LAT ": 30.1}," region_NAME ": "Cairo Governorate", "continent_name" : "Africa", "city_name" : "Cairo" } }Copy the code

In general, querying these documents will require users to understand why we use nested data types for product fields, as well as the nested query syntax. However, with Elasticsearch SQL, we were able to query these nested documents as if each nested document had a single row represented by its parent field (that is, we effectively flattened the presentation structure). Consider an order for two products above. When a field is requested from a product subdocument, it is displayed as two lines when queried. Each line can also contain the fields of the parent order if desired. Such as:

“Look for the billing name and product purchased on flight 584677.”

If we look at kibanA_sample_data_ecommerce, we find that the products field of this index is not of the nested type we imagined. To do this, we need to redefine its mapping:

PUT orders
{
  "mappings": {
    "properties": {
      "category": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "currency": {
        "type": "keyword"
      },
      "customer_birth_date": {
        "type": "date"
      },
      "customer_first_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_full_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_gender": {
        "type": "keyword"
      },
      "customer_id": {
        "type": "keyword"
      },
      "customer_last_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "customer_phone": {
        "type": "keyword"
      },
      "day_of_week": {
        "type": "keyword"
      },
      "day_of_week_i": {
        "type": "integer"
      },
      "email": {
        "type": "keyword"
      },
      "geoip": {
        "properties": {
          "city_name": {
            "type": "keyword"
          },
          "continent_name": {
            "type": "keyword"
          },
          "country_iso_code": {
            "type": "keyword"
          },
          "location": {
            "type": "geo_point"
          },
          "region_name": {
            "type": "keyword"
          }
        }
      },
      "manufacturer": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "order_date": {
        "type": "date"
      },
      "order_id": {
        "type": "keyword"
      },
      "products": {
        "type": "nested",
        "properties": {
          "_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "base_price": {
            "type": "half_float"
          },
          "base_unit_price": {
            "type": "half_float"
          },
          "category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "created_on": {
            "type": "date"
          },
          "discount_amount": {
            "type": "half_float"
          },
          "discount_percentage": {
            "type": "half_float"
          },
          "manufacturer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "min_price": {
            "type": "half_float"
          },
          "price": {
            "type": "half_float"
          },
          "product_id": {
            "type": "long"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            },
            "analyzer": "english"
          },
          "quantity": {
            "type": "integer"
          },
          "sku": {
            "type": "keyword"
          },
          "tax_amount": {
            "type": "half_float"
          },
          "taxful_price": {
            "type": "half_float"
          },
          "taxless_price": {
            "type": "half_float"
          },
          "unit_discount_amount": {
            "type": "half_float"
          }
        }
      },
      "sku": {
        "type": "keyword"
      },
      "taxful_total_price": {
        "type": "half_float"
      },
      "taxless_total_price": {
        "type": "half_float"
      },
      "total_quantity": {
        "type": "integer"
      },
      "total_unique_products": {
        "type": "integer"
      },
      "type": {
        "type": "keyword"
      },
      "user": {
        "type": "keyword"
      }
    }
  }
}  
Copy the code

Above, we made the following changes to the original mapping:

      "products": {
        "type": "nested",
        "properties": {
          "_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "base_price": {
            "type": "half_float"
          },
          "base_unit_price": {
            "type": "half_float"
          },
          "category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "created_on": {
            "type": "date"
          },
          "discount_amount": {
            "type": "half_float"
          },
          "discount_percentage": {
            "type": "half_float"
          },
          "manufacturer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "min_price": {
            "type": "half_float"
          },
          "price": {
            "type": "half_float"
          },
          "product_id": {
            "type": "long"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            },
            "analyzer": "english"
          },
          "quantity": {
            "type": "integer"
          },
          "sku": {
            "type": "keyword"
          },
          "tax_amount": {
            "type": "half_float"
          },
          "taxful_price": {
            "type": "half_float"
          },
          "taxless_price": {
            "type": "half_float"
          },
          "unit_discount_amount": {
            "type": "half_float"
          }
        }
      }
Copy the code

Above this I added the following sentence:

     "type": "nested",
Copy the code

Thus we set the products field to the nested data type. See my previous post “Elasticsearch: Nested objects” if you are still confused about nested data types. We use the following command to do reindex:

POST  _reindex
{
  "source": {
    "index": "kibana_sample_data_ecommerce"
  },
  "dest": {
    "index": "orders"
  }
}
Copy the code

We continue the query in the following way:

sql> SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677; customer_last_name|customer_first_name| products.price |products.product_id -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Underwood | Eddie | | 11.989999771118164 6283 Underwood | Eddie | | 24.989999771118164 19400Copy the code

The _translate API shows how to construct this query using nested queries:

POST /_sql/translate
{
  "query": """
     SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677
  """
}
Copy the code

The result is as follows:

{ "size" : 1000, "query" : { "bool" : { "must" : [ { "term" : { "order_id" : { "value" : 584677, "boost" : 1.0}}}, {" nested ": {" query" : {" match_all ": {" boost", 1.0}}, "path" : "products", "ignore_unmapped" : False, "score_mode" : "none", "boost" : 1.0, "inner_hits" : {"name" : "products_1", "ignore_unmapped" : false, "from" : 0, "size" : 99, "version" : false, "seq_no_primary_term" : false, "explain" : false, "track_scores" : false, "_source" : { "includes" : [ "products.product_id", "products.price" ], "excludes" : []}}}}], "adjust_pure_negative" : true, "boost" : 1.0}}, "_source" : {" includes ": [ "customer_last_name", "customer_first_name" ], "excludes" : [ ] }, "sort" : [ { "_doc" : { "order" : "asc" } } ] }Copy the code

Conversely, if only the parent field is queried, only one line is displayed:

“Find the billing name for flight for order 584677”

sql> SELECT customer_last_name, customer_first_name FROM orders WHERE order_id=584677;
customer_last_name|customer_first_name
------------------+-------------------
Underwood         |Eddie     
Copy the code