Data reading process of ES client
Client -> Shard -> Filesystem Cache -> Disk files
Mass data retrieval query performance optimization
If the memory is large enough, filesystem cache will cache the filesystem. If filesystem cache is used to query files from disks, the query duration is in milliseconds. If a filesystem is used to query files from disks, the query duration is in seconds.
If the index data files on the whole disk are on three machines, a total of 1TB disk capacity is occupied. The ES data volume is 1TB, and the data volume of each machine is 300GB. For optimal performance, your machine can hold at least half of the total data.
In the production environment, it is best to store a small amount of data in ES, reserving 100 gb of memory for filesystem cache for those indexes that are searched. The amount of data is controlled within 100G, which is equivalent to almost all the queried data are searched in memory. The performance is very high, almost the search results can be produced within 1 second.
Another note, it is really the records stored in ES field should be you need to query, should not put all the whole record all the fields in the ES, if all fields in the ES, will result in your machine filesystem chche occupy the space is very large, many records query is actually to go hard disk file, This results in poor query performance.
Data preheating
The background system automatically searches for hot data and loads the data to Filesystem Cache in advance. When the client queries, the hot data is directly retrieved from Filesystem Cache, providing high performance.
Hot and cold separation
Similar to MySQL’s hot and cold separation, a large amount of infrequently accessed data is placed in a single index. Frequently place queries into an index to improve query performance.
Model design
When writing to an index, write the associated data directly into the index. Do not join the index while searching, because complex queries in ES are performance-costly.
Paging query
Distributed, 100 pages of 10 data, you have to query a batch of data from each shard, and then take it in memory paging, the deeper the page, basic query performance is poor. Optimization strategy: 1. Do not allow deep paging 2. For similar to drop-down paging, scroll API can be used to query. Its pagination principle, it will generate a snapshot at a time, and then scroll down through the cursor again and again, no matter how many pages, the performance is millisecond level, scroll intelligent one page after another, natural suitable for microblogging, when pulling down.
This article is published by OpenWrite!