Introduction: As an early access party of Phoenix Eye and a core member of phoenix Eye later, the author experienced the changes of the whole project for four years, and watched the project’s difficult beginning, silent accumulation in the middle period and vigorous development in the later period. Each architectural change is marked by a wave of technology and by continuous innovation by project members using limited resources to solve practical problems.
Phoenix Eye is the performance monitoring system (APM) of Baidu’s commercial business system, which focuses on the monitoring of Java applications and basically connects to most of Baidu’s Java applications (covering thousands of business applications and tens of thousands of containers). It can automatically buried the mainstream middleware frameworks (Spring Web, RPC, database, cache, etc.), realize full-stack performance monitoring and full-link tracking diagnosis, and provide microservice system performance indicators, business gold indicators, health status, monitoring alarms, etc., for each business line of Baidu.
△ Fengjing product flow chart
-
Data collection: Phoenix probe technology can be automatically implanted into the business process to collect relevant performance information, business process is completely unaware.
-
Data calculation and analysis: By type, time series data is stored in THE TIME series database TSDB of Baidu SIA intelligent monitoring platform, which is used to generate visual reports and abnormal alarms. The call chain data is stored in the Palo (open source name Doris) big data warehouse for topology analysis and call chain retrieval.
-
Application scenarios: As mentioned above, Fengjing provides stability report, exception alarm, error stack analysis, service time analysis, call topology analysis, and service log association analysis.
▽ Timeline of bongkyung s architecture change
01 Phoenix Eye project
Launched in 2016, Baidu Phoenix Nest advertising business system middleware (distributed RPC framework Stargate, configuration center, database middleware, etc.) has been perfected. With the further separation of individual services, the overall Java online deployment scale gradually increases, while exposing more and more problems.
Typical questions are:
-
It takes a long time to locate core service problems. After multiple modules reported a large number of errors, it took a long time to locate the problem.
-
The cost of obtaining cluster logs is very high, and some problems cannot be located due to the lack of log invocation chain.
-
You need to log in to an online instance to view exception logs. However, the number of online deployment instances is large and the troubleshooting time is long.
Phoenix Nest business side is in urgent need of a distributed tracking system to complete the “big series” of logs of the whole business side. So Baidu business Platform infrastructure group launched the Phoenix Eye project, called “Phoenix Nest Eye”.
02 chicken eye 1.0
In the field of distributed link tracking, probe acquisition mainly exists in invasive and non-invasive ways. 1.0 Intrusion mode of probe walk. The business developer first introduces a prober-dependent JAR package that automatically collects call relationships and performance data through interceptors. Then, add hard-coded supplementary business data.
△ Encoding example
The data collected by the probe is printed to a disk and collected via Kafka. Storm, Hbase and other popular data processing systems were used for data processing and storage at the bottom. The back-end architecture is complex.
△ Fengjing 1.0 architecture diagram
03 chicken eye 2.0
In Phoenix Phoenix 2.0, the first is to reduce the cost of probe access. In version 2.0, the probe uses Java Agent technology in conjunction with Cglib as AOP annotations, reducing the number of JAR packages introduced by dependencies from N to one. Switch from writing large chunks of call chain padding code to AOP as much as possible. The probe side transport layer uses a more efficient transport protocol (Protobuffer + GZIP), which is directly sent to Kafka through HTTP, greatly reducing the disk IO overhead.
Compared with 1.0, 2.0 probes are easier to access and faster to transfer. But it is still up to the business side to add AOP code. For hundreds of applications on the business side, access is still a big project and promotion is still difficult.
04 chicken eye 3.0
In the architecture design of Phoenix Eyes 3.0, project team members have been thinking about two questions:
-
How to enable the service side to access quickly with as few changes as possible, or even “no perceptive access”?
-
How to reduce the difficulty of architecture operation and maintenance, which can not only handle massive data, but also low cost operation and maintenance?
To solve problem 1, Probe 3.0 decided to completely abandon the invasive approach in favor of non-invasive, or bytecode enhanced, approach.
Several popular monitoring diagnostic tools at that time were investigated:
△Newrelic, pinpoint, Greys monitoring probe investigation
The 3.0 probe references the characteristics of Greys support runtime enhancement and pinpoint, Newrelic based on the design concept of plug-in extension development. The final effect is that the probe can automatically implant monitoring code into the business process, and the specific monitoring work is completed by the plug-in system, which is fully oriented to the section monitoring.
△ Schematic diagram of probe active loading
The back-end storage system relies instead on Doris. Doris is an interactive SQL data warehouse based on MPP developed by Baidu, which is compatible with mysql protocol and has low learning cost. It can be used for storage and analysis and calculation. In the initial stage, technologies such as Spark and Storm are avoided to reduce system complexity.
▽ Architecture design is shown in the figure
With the architecture upgrade, small teams can quickly deploy probes in batches, and the computational storage capacity can meet their needs. By 2017, Fengjing 3.0 had launched more than 100 applications, running on more than 1,000 containers.
05 chicken eye 4.0
In 2018, the wave of microservices and virtualization is coming. With the continuous upgrade of the deployment platform and the maturity and perfection of the Springboot system, the single unit can be quickly split into a large number of micro-services, relying on the platform for efficient operation and maintenance deployment. Fengjing, as a basic component, was integrated by the micro-service hosting platform and promoted and applied at the company level. The overall deployment scale increased from 100 level to 1000 level, and the deployment container from 1000 level to 10000 level.
At this stage, a lot of problems broke out. There are two main technical core problems:
-
The probe upgrade takes effect after the service application is restarted, but the restart traffic of online applications is low. This makes it difficult to update the probe version frequently and introduce new features quickly.
-
There are 15 billion real-time writes per day, and the peak traffic is 300w /s. Data import is easy to lose; It takes more than 100 seconds to retrieve a single call chain lookup.
In 2019, Fengjing has carried out further transformation and upgrading, and made technical breakthroughs for problems 1 and 2.
Probe level studies how to support hot swap, that is, the probe automatically completes the upgrade while the service process is running. In order to ensure the visibility of the business class to the probe plug-in class, the probe class was placed in the System Classloader. However, the System Classloader is the System default and cannot be uninstalled. Instead, put all the probe classes into a custom classloader. The probe class is completely invisible to the business class and thus cannot accomplish bytecode enhancement.
△ Probe hot swap Classloader system
In order to solve the visibility problem, the probe introduced the bridge class, through the bridge class provided by the code peg and plug-in library projection, the user class can access the actual use of the probe class, to complete the purpose of monitoring transformation. For different plug-ins, they are placed in different custom classloaders. This makes plug-ins invisible to each other. A single plug-in can also be hot-swappable. Specific design details will be followed by a detailed interpretation of the article.
Needless to say, Phoenix probe is the only hot-swappable monitoring probe technology in the industry, and we have applied for a patent. Its function correctness and performance have been verified by large-scale online traffic.
Continue to optimize the performance of call chain retrieval.
Let’s first analyze our underlying storage structure:
Through the analysis of slow query, it is found that there are two main reasons for slow retrieval: first, a large number of queries without any index, and it is very slow to scan massive data in full table. Second, it causes a file Compaction to take a very slow time when a read Compaction is typical of an LSM-tree. To solve these problems, the chain storage layer is called to reconstruct the table structure and optimize the query time through a large number of rollups with the basic table. Doris had the ability of streaming import at this time, and took the opportunity to switch from small batch import to streaming import.
△ Call chain processing architecture
△ The figure above is the real-time microservice panorama constructed by Fengjing. As of January 2020, online traffic topologies cover approximately dozens of product lines, with the most fine-grained nodes being interfaces, or functions in Java applications. As can be seen from the figure, there are about 50W + hosting full-platform non-island interface nodes and 200W + connections of interface nodes.
06 Data processing architecture separation
As the architecture continues to evolve, the amount of data collected by Fengjing is increasing, and so are the requirements of the business side.
There are two main problems:
-
Data visualization ability depends on front-end development, and a large number of multidimensional visualization analysis requirements are difficult to meet.
-
Call chain sampling, resulting in inaccurate statistical data, can not meet the needs of statistical reports.
These two issues boil down to how temporal data is stored and presented. This involves two fundamental concepts in the distributed tracing world, sequential time and call chain data. The so-called time series data is a series of data based on time, which is used to view some indicator data and indicator trend. Call chain data is to record the entire flow of a request to see where the request failed and where the bottleneck of the system is.
Temporal data does not need to store details, but only Time, dimension and index data points, which can be stored in a special Time Series Database. In the actual scenario, Fengjing does not specifically maintain a timing database. It is a distributed timing database TSDB connected to Baidu SIA intelligent monitoring platform. At the same time, baidu SIA platform is used to provide rich multidimensional visual analysis reports to meet the needs of users for various visual multidimensional data analysis.
▽ Current overall structure
07 epilogue
The whole project of Fengjing lasted for 4 years, during which numerous difficulties and frustrations were experienced. Through the continuous efforts of project members, the milestone achievement was finally achieved. This paper briefly introduces the business background, technical architecture and product form of Fengjing products. In the future, we will continue to publish articles to introduce the implementation details related to technology. Welcome your continuous attention.
Baidu Commercial Large-scale micro-service distributed monitoring system — Fengjing
Recommended reading
Baidu search and recommendation engine cloud native transformation
Finally, welcome to our public account “Baidu Geek Talk” ~