Elasticsearch usually allows you to quickly search large amounts of data. In some cases, the search may be performed on many Shards, possibly against frozen Indices and across multiple remote clusters, so that the expected results are not returned within milliseconds. When you need to run a search for a long time, waiting for results to return simultaneously is not ideal. In contrast, asynchronous search lets you submit search requests that are executed asynchronously, monitor the progress of the request, and retrieve the results at a later stage. You can also retrieve partial results while they are available but before the search is complete.

You can submit an asynchronous search request using the Submit Asynchronous search API. Using the GET Async Search API, you can monitor the progress of asynchronous search requests and retrieve their results. Ongoing asynchronous searches can be deleted using the Delete Async Search API.

 

Async search

The asynchronous search API enables you to execute search requests asynchronously, monitor their progress, and retrieve partial results as they become available.

Submit the Async Search API

Execute the search request asynchronously. It accepts the same parameters and request body as the Search API.

POST /sales*/_async_search? size=0 { "sort": [ { "date": { "order": "asc" } } ], "aggs": { "sale_date": { "date_histogram": { "field": "date", "calendar_interval": "1d" } } } }Copy the code

The response contains the identifier of the search being performed. You can use this ID to retrieve the final results of your search later. The currently available search results are returned as part of the Response object.

{
  "id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",  (1)
  "is_partial" : true, (2) 
  "is_running" : true, (3)
  "start_time_in_millis" : 1583945890986,
  "expiration_time_in_millis" : 1584377890986,
  "response" : {
    "took" : 1122,
    "timed_out" : false,
    "num_reduce_phases" : 0,
    "_shards" : {
      "total" : 562, (4)
      "successful" : 3, (5)
      "skipped" : 0,
      "failed" : 0
    },
    "hits" : {
      "total" : {
        "value" : 157483, (6)
        "relation" : "gte"
      },
      "max_score" : null,
      "hits" : [ ]
    }
  }
}
Copy the code
  1. An identifier for an asynchronous search that can be used to monitor its progress, retrieve its results, and/or delete it
  2. Indicates whether the search failed or completed successfully on all shards when the query is no longer running. Is_partial is always set to true while the query is being executed
  3. Whether the search is still executing or completed
  4. How many shards will be searched in total
  5. How many pieces have been successfully searched
  6. How many files currently match the query and belong to the shard that has completed the search

Note: Although the query is no longer running, so is_RUNNING is set to false, the results may be incomplete. This can happen if some shards return results and the search fails, or if the node coordinating the asynchronous search dies.

By providing the wait_FOR_Completion_timeout parameter (default: 1 second), you can block and wait until a search for a timeout is complete. When the asynchronous search completes within this timeout, the response will not contain the ID because the results are not stored in the cluster. You can set the keep_on_completion parameter (which defaults to false) to true to request that the results be stored for later retrieval, or you can use it to complete a search within wait_for_completion_timeout.

You can also specify how long the asynchronous search should take with the keep_alive parameter (default is 5D, or five days). After this period, ongoing asynchronous searches and all saved search results are deleted.

Note: When the primary sort of the result is an indexed field, shards sort by the minimum and maximum value they hold for that field, so partial results can be obtained based on the requested sort criteria.

The submit asynchronous Search API supports the same parameters as the Search API, although some parameters have different default values:

  • The default value of batched_reduce_size is 5: this affects the frequency with which partial results are available, which happens whenever sharding results are reduced. Each time the coordination node receives a certain number of new shard responses (5 by default), partial reduction is performed.
  • Request_cache defaults to true
  • The default value of pre_filter_shard_size is 1 and cannot be changed: this is to force a prefilter round trip to retrieve statistics from each shard, so that data that definitely does not save documents that match the query is skipped.
  • Ccs_minimize_roundtrips defaults to false, which is the only supported value

Warning: Asynchronous search does not support scrolling or search requests that contain only the suggestion part. Cross-cluster searching is supported only when CCS_minimize_roundTRIPS is set to false.

 

Get async search

Given its ID, getting the Async Search API retrieves the results of a previously submitted asynchronous search request. If the Elasticsearch security function is enabled. Access to a specific asynchronous search result is limited to the user who first submits it.

GET /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
Copy the code
{ "id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=", "is_partial" : true, (1) "is_running" : true, (2) "start_time_in_millis" : 1583945890986, "expiration_time_in_millis" : 1584377890986, (3) "response" : { "took" : 12144, "timed_out" : false, "num_reduce_phases" : 46, (4) "_shards" : { "total" : 562, (5) "successful" : 188, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 456433, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { (6) "sale_date" : { "buckets" : []}}}}Copy the code
  1. Indicates whether the search failed or completed successfully on all shards when the query is no longer running. Is_partial is always set to true when the query is executed
  2. Whether the search is still executing or completed
  3. When the asynchronous search expires
  4. Indicates how many result reductions have been performed. If this number increases compared to the last result retrieved, you can expect the search results to contain more results
  5. Indicates how many shards performed the query. Note that in order to include the Shard results in the search response, the shard results need to be reduced first.
  6. Partial aggregate results from shards that have completed query execution.

When the Get Async Search API is called, the wait_for_Completion_timeout parameter can also be provided to wait for the Search to complete until the provided timeout. If the timeout expires, the final result (if any) is returned, otherwise the currently available result is returned when the timeout expires. By default, no timeout is set, which means that currently available results will be returned without any additional waiting.

The keep_alive parameter specifies how long the asynchronous search should be available in the cluster. If not specified, a Keep_alive set with a corresponding commit asynchronous request is used. Otherwise, you can override the value and extend the validity of the request. If the search is still running after the period expires, the search is cancelled. If the search is complete, its saved results are deleted.

 

Delete the async search

You can manually remove asynchronous searches by ID using the Delete Async Search API. If the search is still running, the search request will be canceled. Otherwise, the saved search results are deleted.

DELETE /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
Copy the code

reference

【 1 】 www.elastic.co/guide/en/el…