How can you improve the search efficiency of ElasticSearch with billions of levels of data?
The question is, have you actually used ES, because what? In fact, ES performance is not as good as you think.
Many times the amount of data is large, especially when there are hundreds of millions of pieces of data, you may find that the meng force, run a search how to 5~10s, pit dad.
For the first time, it was 5 to 10 seconds, but then it was faster, maybe a few hundred milliseconds.
Then you get confused. Every user’s first visit is slow. So if you’ve never played ES before, or you’ve played a Demo by yourself, it’s easy to get confused when asked this question, which shows that you really don’t play ES well?
To be honest, there is no silver bullet for ES performance optimization. What does that mean? Just don’t expect a single parameter to be a one-size-fits-all response to all slow performance scenarios.
There may be scenarios where you can change the parameters or adjust the syntax, but not all scenarios.
Performance optimization killer: Filesystem Cache
The data you write to ES is actually written to disk files, which are automatically cached in Filesystem Cache by the operating system.
The whole process is shown in the figure below:
The ES search engine relies heavily on the underlying Filesystem Cache. If you add as much memory to Filesystem Cache as possible to accommodate all IDX Segment files, your search will be based on the underlying Filesystem Cache. The performance will be very high.
How big can the performance gap be? Many of our previous tests and pressure tests, if the disk is generally certain on the second, search performance is absolutely second level, 1 second, 5 seconds, 10 seconds.
Filesystem Cache, however, provides an order of magnitude higher performance than disk Cache, ranging from a few milliseconds to hundreds of milliseconds.
Performance analysis
Consider a real case: a company ES node has 3 machines, each machine seems to have a lot of memory 64GB, the total memory is 64 * 3 = 192G.
The ES JVM Heap for each machine is 32 GB, leaving only 32 GB for Filesystem Cache per machine. The total amount of memory allocated to Filesystem Cache is 32 x 3 = 96 GB.
At this time, the index data files on the whole disk occupy a total of 1T disk capacity on the three machines. If the ES data amount is 1T, then the data amount on each machine is 300G.
Is it a good performance?
Filesystem Cache has only 100 gigabytes of memory, so one-tenth of the data can be stored in memory, and the rest is stored on disk. Then you perform searches, most of which are performed on disk, and performance is poor.
At the end of the day, you want ES to perform well, and in the best case, your machine’s memory should hold at least half of your total data.
Based on our own experience in production environments, it is best to store only a small amount of data in ES.
That is, if the Filesystem Cache has 100 GIGABytes of memory for the indexes you want to search for, you limit the index data to 100 GIGABytes. In this way, your data is almost all searched in memory, and the performance is very high, usually in less than a second.
Let’s say you have a row of data: ID, name, age…. 30 fields. But if you’re searching now, you just need to search by id, name, age.
If you foolishly write one line of data into ES and all the fields, you’re going to end up with 90% of the data being unsearchable.
However, these data simply occupy the space of Filesystem Cache on the ES machine. The larger the data volume of a single data, the less data the Filesystem Cahce can Cache.
In fact, just write a few fields in ES to retrieve, for example, write ES ID, name, age three fields.
You can then store other field data in MySQL/HBase. We generally recommend ES + HBase.
HBase is a column database, which is suitable for online storage of massive data. Massive data can be written to HBase, but simple queries based on IDS or ranges can be performed instead of complex searches.
If you search ES based on name and age, you may get 20 doc ids. Then, you can query the complete data corresponding to each DOC ID in HBase based on the DOC ID, and then return it to the front end.
The size of Filesystem Cache should be smaller than or slightly larger than the size of Filesystem Cache.
It may take you 20ms to retrieve data from ES, and 30ms to retrieve 20 pieces of data from HBase based on the id returned by ES.
If you play with 1 terabyte of data in ES as you used to, you might have 5 to 10 seconds per query, but now you have very high performance, 50ms per query.
Data preheating
If you do this, the ES cluster still writes twice as much data per machine as Filesystem Cache.
For example, if you write 60 gigabytes of data to a machine and the result is 30 gigabytes of Filesystem Cache, 30 gigabytes of data remains on disk.
In this case, you can actually do data preheating. For example, take weibo as an example, you can set up a system in the background in advance for some big V data that many people usually read.
And then every once in a while, your own background system searches for the hot data, flushes it into Filesystem Cache, and when the user actually looks at the hot data later, they’re searching directly from memory, very quickly.
Or for e-commerce, you can create a program in the background that automatically accesses the hot data of some items you normally view most, such as the iPhone 8, every minute, and flush it into Filesystem Cache.
In short, it is a good idea to have a dedicated cache warm-up subsystem for data that you think is hot and that is frequently accessed.
Hot data is then accessed in advance at regular intervals, allowing the data to be stored in the Filesystem Cache. The next time someone accesses it, the performance will be much better.
Hot and cold separation
ES can do a horizontal split similar to MySQL, that is, a large amount of data that is accessed infrequently, a separate index, and then hot data that is accessed frequently, a separate index.
As an example, suppose you have 6 machines, 2 indexes, 1 cold data, 1 heat data, and 3 shards per index. Heat release data Index of 3 machines and cooling data Index of the other 3 machines.
In this case, you spend most of your time accessing the hot Index data, and the hot data may account for 10% of the total data volume. At this point, the hot data volume is relatively small, and almost all of it is stored in Filesystem Cache, ensuring high hot data access performance.
But for the cold data, it’s in a different Index, it’s not on the same machine as the hot data Index, so we don’t have any contact with each other.
If someone accesses cold data, maybe a lot of the data is on disk, and the performance is pretty low, then 10% of the people are accessing cold data, 90% of the people are accessing hot data, and it doesn’t matter.
Associated query in ES
For MySQL, we often have complex associative queries. How to play in ES?
ES inside the complex associated query as far as possible do not use, once used performance is generally not good. It is best to complete the association in the Java system first and write the associated data directly to ES. When searching, there is no need to use ES search syntax to complete associative search like Join.
Document Model design
Document model design is very important, a lot of operations, don’t want to perform all sorts of complicated messy operations when searching.
There are only so many operations that ES can support. Don’t even think about using ES to do things that it can’t do well. If you do that, try to do it when you design the Document model, when you write it.
In addition, some complicated operations, such as join/nested/parent-child search, should be avoided as much as possible, because the performance will be poor.
Paging performance optimization
ES is a bad pagination, why? For example, if you have 10 pieces of data per page and you want to query page 100, you are actually looking up the first 1000 pieces of data stored on each Shard to a coordination node.
If you have five shards, you have 5000 pieces of data, and then the coordination nodes do some merging and processing of those 5000 pieces of data to get to the final 10 pieces of data on page 100.
Since it’s distributed, you want to look at 10 pieces of data on page 100. You can’t say that from 5 shards, each Shard looks at 2 pieces of data, and then finally the coordinated node merges into 10 pieces of data.
You have to look up 1,000 entries from each Shard, sort, filter, and so on according to your needs, and then page out again to get to page 100.
As you turn pages, the deeper you go, the more data each Shard returns, and the longer it takes to coordinate node processing, which is very frustrating. So when you do pages in ES, you’ll see that as you go to the back, it slows down.
We have also encountered this problem before, using ES as a page, the first few pages will be tens of milliseconds, when turning to 10 or dozens of pages, it will take 5 to 10 seconds to find a page of data.
Is there a solution? Two ideas:
Deep paging is not allowed (the default deep paging performance is poor). Tell the product manager that your system does not allow you to turn pages that deep, and the deeper you turn pages by default, the worse your performance will be.
2. Similar to the page after page of recommended products in the App; It is similar to the scrolling of weibo, you can Scroll down to Scroll page by page, you can use the Scroll API, you can search on the Internet to learn how to use it.
So how does Scroll do that?
It’s going to give you a snapshot of all of your data at once, and then every time you swipe back and scroll through the cursor scroll_id to get the next page, the next page, it’s going to be much, much, much better than the paging performance described above, basically in milliseconds.
The only thing, though, is that it works in a twitter-like pull-down page-flipping scenario where you can’t jump to any page. That is, you can’t go to page 10, then to page 120, and then back to page 58.
So now a lot of products, you are not allowed to arbitrarily turn the page, you can only pull down, page by page.
Note that the Scroll parameter must be initialized to tell ES how long to save the search context. You need to make sure that users don’t keep scrolling for hours on end, or they may fail due to timeout.
In addition to using the Scroll API, you can also do this using search_after. The idea of search_after is to use the results of the previous page to help retrieve the data on the next page.
Obviously, this method also doesn’t allow you to turn pages at random, you have to turn the page backwards. At initialization, you need to use a field with a unique value as the Sort field.