In our last article, “Elasticsearch: Scroll interface for better pagination of large amounts of data”, we explained how to pagination large amounts of data effectively using scroll interface. In that article, we covered two approaches:
- The from + size method is used for paging
- Use the Scroll interface for pagination
For large amounts of data, we try to avoid using from+size. The reason for this is that the default value of index.max_result_window is 10K, which means the maximum value of from+size is 10,000. Search requests take up heap memory and time proportional to from+size, which limits memory. If you want to hit from 990 to 1000, you need at least 1000 documents per shard:
In order to avoid excessively busy our cluster, the Scroll interface is usually recommended as deep scrolling, but since maintaining Scroll context is also very expensive, this method is not recommended as real-time user requests. The search_after parameter solves this problem by providing a real-time cursor. The idea is to use the results of the previous page to help retrieve the next page.
Let’s start by entering the following document into the Twitter index:
POST _bulk { "index" : { "_index" : "twitter", "_id": {1}} "user" : "ShuangYuShu - zhang SAN", "DOB" : "1980-01-01", "message" : "the weather is good today, Walk to ", "uid" : 2, "age" : 20, "city", "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "haidian district in Beijing, China", "location" : {" lat ":" 39.970718 ", "l On ":" 116.325747 "}} {" index ": {" _index" : "twitter", "_id" : {2}} "user" : "dongcheng district - liu", "DOB" : "1981-01-01", "message" : "in yunnan, the next stop!" , "uid" : 3, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district stylobate factory three 3", "location" : {" lat ":" 39.904313 "," Lon ":" 116.412754 "}} {" index ": {" _index" : "twitter", "_id" : {3}} "user" : "dongcheng district - li si", "DOB" : "1982-01-01", "message" : "happy birthday!" , "uid" : 4, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing dongcheng district", "location" : {" lat ":" 39.893801 ", "says lon" : "1 16.408986 "}} {" index ": {" _index" : "twitter", "_id" : }} {"user":" DOB","DOB":"1983-01-01", "Message" : "123, gogogo", "uid" : 5, "age" : 35, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "China Beijing chaoyang district jianguomen", "location" : {" Lat ":" 39.718256 ", "says lon" : "116.367910"}} {" index ": {" _index" : "twitter", "_id" : 5}} {" user ":" the chaoyang district - Lao wang ", "DOB" : "1984-01-01", "message" : "Happy BirthDay to My Friend!" , "uid" : 6, "age" : 50, "city" : "Beijing", "province", "Beijing", "country" : "Chinese", "address" : "chaoyang district in Beijing, China international trade", "location" : {" lat ":" 39.918256 ", "says lon" : "116.467910"}} {" index ": {" _index" : "twitter", "_id" : 6}} {"user":" hongqiao - laowu ", "DOB":"1985-01-01", "message":" today is my birthday, happy birthday, happy birthday!" , "uid" : 7, "age" : 90, the "city" : "Shanghai", "province", "Shanghai", "country" : "Chinese", "address" : "China Shanghai minhang district", "location" : {" lat ":" 31.175927 ", "says lon" : "1 21.383328}}"Copy the code
There are six documents. Suppose the query to retrieve the first page looks like this:
The GET twitter / _search {" size ": 2," query ": {" match" : {" city ":" Beijing "}}, "sort" : [{" DOB ": {" order" : "asc" } }, { "user.keyword": { "order": "asc" } } ] }Copy the code
The result displayed is:
{ "took" : 29, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "Twitter," "_type" : "_doc", "_id" : "1", "_score:" null, "_source" : {" user ":" ShuangYuShu - zhang SAN ", "DOB" : "1980-01-01", "message" : "today the weather is good, walk to", "uid" : 2, "age" : 20, "city" : "Beijing", "province", "Beijing", "country" : "China" and "address", "China Beijing haidian district", "location" : {" lat ":" 39.970718 ", "says lon" : "116.325747"}}, "sort" : [315532800000, "ShuangYuShu - zhang SAN"]}, {" _index ":" twitter "and" _type ":" _doc ", "_id" : "2", "_score:" null, "_source" : {" user ", "the dongcheng district - liu", "DOB" : "1981-01-01", "message" : "in yunnan, the next stop!", "uid" : 3, "age" : 30, "city" : "Beijing", "province" : "Beijing ", "country" :" China ", "address" : "No. 3 taiji Chang 3", "location" : {"lat" : "39.904313"," LON ": "116.412754"}}, "sort" : [347155200000, "the dongcheng district - liu"]}}}]Copy the code
The result of the above request includes an array of sort values for each document. These sort values can be used with the search_after parameter to start returning any documents after this list of results. For example, we can use the sort value of the previous document and pass it to search_after to retrieve the next page result:
The GET twitter / _search {" size ": 2," query ": {" match" : {" city ":" Beijing "}}, "search_after" : ] [347155200000, dongcheng district - "liu", "sort" : [{" DOB ": {" order" : "asc"}}, {" user. The keyword ": {" order" : "asc"}}]}Copy the code
Here in search_after, we put in the sort value of the previous search result. The result displayed is:
{ "took" : 47, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "Twitter," "_type" : "_doc", "_id" : "3", "_score:" null, "_source" : {" user ":" dongcheng district - li si ", "DOB" : "1982-01-01", "message" : "happy birthday!", "uid" : 4, "age" : 30, "city" : "Beijing", "province", "Beijing", "country" : "China" and "address", "China Beijing dongcheng district", "location" : {" lat ":" 39.893801 ", "says lon" : "116.408986"}}, "sort" : [378691200000, "the dongcheng district - li si"]}, {" _index ":" twitter "and" _type ":" _doc ", "_id" : "4", "_score:" null, "_source" : {" user ", "chaoyang district - old jia", "DOB" : "1983-01-01", "message" : "123, gogogo", "uid" : 5, "age" : 35, "city" : "Beijing", "province" : "Beijing", "country" : "Chinese", "address" : "China Beijing chaoyang district jianguomen", "location" : {" lat ":" 39.718256 ", "says lon" : "116.367910"}}, "sort" : [410227200000, "chaoyang district - old jia"]}}}]Copy the code
Note: When we use search_after, the FROM value must be set to 0 or -1.
Search_after is not a free jump to random pages but a parallel scroll solution for multiple queries. It is very similar to the Scroll API, but unlike it, the search_after parameter is stateless and is always resolved against the latest version of the searcher. Therefore, the sort order may change during the walk, depending on the update and deletion of the index.