Original: https://mp.weixin.qq.com/s/FVbjGTvZP97u5CAydDx7dw
preface
Elasticsearch (ES) is a popular open source distributed search and analysis engine. It can easily implement real-time log analysis, full-text search, and structured data analysis, and greatly reduce the cost of data mining.
ES has rich large-scale landing scenes within Tencent and among Tencent cloud users, which continuously promote the optimization of native ES towards high availability, high performance and low cost. This paper will introduce Tencent’s challenges, optimization ideas, optimization results and future exploration directions in the implementation of ES application, hoping to provide reference for developers.
1. Application scenarios of ES
1.Tencent internal application scenarios
Operation logs: for example, slow logs and exception logs (used to locate service problems).
Business logs: such as user click and access logs (used to analyze user behavior);
Audit logs: Used for security analysis.
Elastic ecosystem Integrity: Any developer can build a complete log analytics system in real time by simply deploying mature components.
High timeliness: it takes 10 seconds for logs to be generated and accessible. Compared with traditional big data solutions that take tens of minutes to several hours, the timeliness is improved by hundreds of times.
Flexible search and analysis capabilities: ES has very flexible search and analysis capabilities due to the support for inverted index, column storage and other data structures.
Short search response time: ES supports interactive analysis, with search response times in seconds, even with trillions of logs.
2.【Search Service Scenario】
The typical scenario is as follows:
Commodity search: namely, commodity search in major e-commerce platforms;
High performance: the maximum value of a single service is 10w+ QPS, the average response time is about 20ms, and the P95 delay is less than 100ms.
Strong correlation: The search experience mainly depends on the degree of matching between “search results” and “user intentions”, which can be evaluated by the accuracy rate, recall rate and other indicators.
High availability: In search scenarios, four nines are required and disaster recovery (Dr) is supported.
3.【Sequence data analysis scenario】
Metrics: Traditional server monitoring.
APM: monitors application performance.
Sensor data: generated by iot data, intelligent hardware, industrial iot, etc.
High concurrent write: the maximum size of a single online cluster is 600+ nodes, and the write throughput is 1000w/s.
High query performance: the query delay of a single curve or time line is 10ms.
Multi-dimensional analysis: Requires flexible and multi-dimensional statistical analysis capabilities. For example, you can perform statistical analysis based on different dimensions, such as regions and service modules.
2. Industry application scenarios
In the industry, ES is mainly used in e-commerce platforms, such as searching commodities, labels and stores.
M, an e-commerce website with an annual GMV of 20 billion, is a typical example. Its application scenarios in ES are also very extensive, including product search and recommendation, log analysis and monitoring, statistical analysis, etc. Its business teams run dozens of ES clusters and are growing.
Ii. Challenges encountered
1.Tencent’s challenges
Search class】
High availability: Take e-commerce search, APP search, and intra-site search as examples. They attach importance to availability and have service SlAs of more than 4 9’s. Disaster recovery scenarios, such as single-node failures and network failures in single-node rooms, need to be considered.
High performance: They have very high standards for performance, such as 20W QPS, 20ms flat sound, P95 delay 100ms.
High availability and high performance.
The sequential class】
Timing scenarios include logging, Metrics, APM, and so on. Compared with search services, which focus on high availability and high performance, sequential services pay more attention to cost and performance.
2. Challenges in the industry
The stability ofAnd of the cluster
Disaster backup.
3. ES optimization practice
1 system robustness: refers to the robustness of the ES kernel itself, which is also a common problem faced by distributed systems. Such as:
2. Disaster recovery solution: For example, a management and control system is used to quickly recover equipment room services when a network fault occurs, prevent data loss when a natural disaster occurs, and quickly roll back after misoperations.
3. System defects: These are inevitable problems in the development of any system, such as Master node congestion, distributed deadlock, slow rolling restart, etc.
1. System robustness:
Service instability: Traffic limiting is implemented to tolerate exceptions such as machine network faults and abnormal query.
Cluster scalability: by optimizing the metadata management and control logic of the cluster, the cluster expansion capacity has been improved by 10 times, supporting thousand-level node cluster and millions of fragments.
Cluster balancing: Optimize the fragment balancing among nodes and multiple disks to balance the pressure of a large-scale cluster.
2. Dr Scheme:
Data recoverable:
By extending the plug-in mechanism of ES to support backup file, ES data backup file back to cheap storage, to ensure the recovery of data;
Fault tolerance:
Tencent CLOUD ES supports cross-availability zone Dr. Users can deploy multiple availability zones as required to tolerate the failure of a single room.
In addition, Tencent CLOUD ES also supports COS backup function. Users can directly back up the underlying data files to COS of Tencent cloud object storage by operating THE API of ES, realizing low-cost and easy data backup function.
Abnormal recovery:
The garbage can can be used to quickly recover a cluster in the event of overpayment or misoperation.
3. System defects:
In terms of service stream limiting, Tencent has carried out stream limiting work at four levels:
Permissions: Optimized, ES supports XPack and custom permissions to prevent attacks and misoperations;
Queue level: by optimizing task execution speed, repetition, priority and other details, it can solve the problems that users often encounter, such as accumulation of Master task queue and task starvation.
Memory level: Starting from ES 6.x, memory flow limiting can be implemented on all links (including HTTP portals, coordination nodes, and data nodes). JVM memory and gradient statistics are used for precise control.
Multi-tenant layer: The CVM/Cgroups scheme is used to ensure resource isolation between multiple tenants.
This section describes the problem of traffic limiting in aggregation scenarios. When using ES for aggregation analysis, the memory is often exhausted due to too many aggregation buckets. The max_buckets parameter is provided in ES 6.8 to control the maximum number of buckets for aggregation, but this method has very limitations: Whether or not the memory is exhausted also depends on the size of the single bucket (in some scenarios 200,000 buckets will work, but in others 100,000 buckets will be exhausted), so the user is not sure exactly what to set this parameter to. At this time, Tencent adopts gradient algorithm for optimization, and checks THE JVM memory once for every 1000 buckets allocated. When the memory is insufficient, the request is interrupted in time to ensure the high availability of ES cluster.
Based on these bottleneck analysis and data access features, the cost optimization solution is as follows:
(1) Use cold and hot separation architecture, and use mixed storage solutions to balance cost and performance;
(2) Since historical data is usually accessed by statistics, storage and performance are traded through prediction calculations;
(3) If the historical data is not used at all, it can be backed up to a cheaper storage system;
(4) Based on the access characteristics of sequential data, the memory cost can be optimized by Cache.
(5) Other optimization methods: such as storage tailoring, life cycle management, etc.
Here’s the Rollup section. Rollup is similar to Cube and materialized view in big data scenarios. The core idea of Rollup is to generate statistics in advance through prediction and release original granularity data, thus reducing storage cost and improving query performance. Here is a simple example: in a machine monitoring scenario, the raw granularity of monitoring data is 10 seconds, whereas monitoring data from a month ago is generally viewed in an hour granularity, which is a Rollup application scenario. Rollup officially started with ES 6.x, and Tencent has actually started working on it in 5.x.
As mentioned above, when many users use large storage models, memory becomes the bottleneck first and hard disks cannot be fully utilized. The main bottleneck of this kind of problem is that indexes occupy a large amount of memory. Considering that historical data is rarely accessed in sequence scenarios and some fields are not used in some scenarios, Tencent introduces Cache to improve memory utilization efficiency.
(1) Write optimization: For the primary key deduplication scenario, index clipping is used to accelerate the primary key deduplication process and improve write performance by 45%. In addition, write performance is optimized through quantization execution, branch hops and instruction Miss are also one of the directions Tencent is exploring, and the performance is expected to be improved by 1 times.
(2) CPU utilization optimization: Aiming at the problem that CPU cannot be fully utilized in partial pressure measurement scenarios, the performance can be improved by 20% by optimizing resource preemption when ES refresh Translog.
(3) Query optimization: Optimize the Merge strategy to improve query performance. Query pruning based on the min/ Max index of each Segment record improves query performance by 30%. Using the CBO policy, the Cache query operation avoids burrs that take 10 times more time. In addition, optimizing performance with some new hardware (such as Intel’s AEP, Optane, QAT, etc.) is a good direction to explore.
The ES native Merge strategy focuses on size similarity (Segments of similar size should be selected) and maximum upper limit (Segments should be pieced together to 5GB). It is possible that a Segment contains data from the whole month of January or March 1. When users query data from a certain hour of March 1, they must scan a large amount of useless data, resulting in serious performance loss.
Iv. Future planning and open source contribution
Taking the big data Atlas as an idea, the whole big data field can be divided into three parts according to the characteristics of data volume and delay requirements.
(1) DataEngineering, including batch calculation and flow calculation;
(2) DataDiscovery, including interactive analysis, search, etc.;
(3) DataApps, mainly used to support online services.
Although ES is considered a search technology, ES also supports online search, document services, and other scenarios; In addition, there are a number of mature OLAP systems that are also based on technology stacks such as inverted indexes and column and column blending. It can be seen that ES has a very strong feasibility to develop in these two fields. Therefore, Tencent will focus on “online services” and “OLAP analysis” and other directions to explore ES technology.
The last
Welcome everyone to pay attention to my public account [programmer Chase wind], 2019 many companies Java interview questions sorted out more than 1000 400 pages of PDF documents, articles will be updated in it, sorted information will also be placed in it.
If you like the article, remember to pay attention to me. Thank you for your support!