background

Amap, as a national travel and life service platform with a daily life of more than 100 million, carries massive user services in the background of the super-large cluster. From the user’s point of view, if something goes wrong, it’s a big deal. 3 The online environment and links are complicated due to the remote deployment of the equipment room. Under such conditions, how to avoid user damage caused by faults, do a good job in capacity planning and DISASTER recovery under complex link conditions, and find problems in the first time, it is very important to do emergency response through traffic control and plan drills, and all the work can not wait until things happen. We need to have a means of verification to do a good early performance bottom, this is the full link pressure measurement, so that the real traffic in advance.

As an important means to ensure the stability of online services, full-link pressure measurement is also very important to Autonavi. Autonavi’s full-link pressure measurement platform TestPG has been developed from scratch. After undergoing normal pressure measurement, it can basically guarantee all full-link pressure measurement and daily pressure measurement of Autonavi, achieving the goal of fast and accurate full-link pressure measurement and full-link pressure measurement in the initial stage of the platform. Corpus production (flow processing) is an important part of full-link pressure measurement, which will be introduced in this paper.

A full-link pressure survey can be simply summarized as three steps: flow processing before pressure survey (i.e., production of corpus), determination of pressure model during pressure survey and start of pressure survey, results analysis and problem location after pressure survey. In each full link pressure test, the flow processing before pressure test is the most time-consuming part of the whole process. In the past, logs were collected by operation and maintenance and sent to test students to write scripts, which was time-consuming and costly, and there were many problems such as request expiration. Based on these problems, TestPG has standardized the corpus format of Autonavi and unified the flow process of Autonavi. However, with the evolution of Autonavi full-link pressure measurement, the following two major problems are faced:

Lack of unified control of corpus production process. Although the format of corpus has been standardized in the early stage of the platform, each business only handles the flow according to the standard of corpus, and the production process lacks unified and standardized control, resulting in high production cost of corpus. Especially for full – link compression, corpus preparation is the most time-consuming part.

Precise pressure control at the interface level cannot meet requirements. As a national travel application, autonavi’s flow is greatly affected by weather, terrain and holidays. Take driving navigation, for example, most of the daily driving navigation is short distance, and the National Day, the Spring Festival are mostly long distance driving navigation, and the long distance driving navigation requirements on the back end of the calculation force is nonlinear increase, or even doubled. However, driving navigation for long and short distances is the same interface for the pressure measurement platform, and the platform’s current precise pressure control can only achieve the interface level, but cannot simulate the pressure measurement at the interface characteristic level. Based on the above two problems, the autonavi full-link pressure measurement team set up a special project of corpus intelligence to focus on solving the above related problems.

And the way to solve the problem

Drainage standardization

At that time, autonavi’s full-link pressure measurement had basically pulled through most services, but it still belonged to an evolution stage. As for corpus processing, it is mainly used for pressure measurement after being processed by each business. The sources of corpus processing are not consistent, and log, ODPS, flow and other processing sources are common. For the unified management and control of corpus production process, the first thing we think of is the unified processing source of corpus. We must choose a low-cost and efficient way as the input of corpus production, and flow recording is very suitable. After investigation, it is found that autonavi’s other business scenarios also have a great demand for flow recording. However, autonavi’s traffic recording methods were not unified in the past, and copying traffic of each service line often caused problems such as unstable on-line machines. So the first thing to do is to unify autonavi’s flow recording, standardized drainage.

Platform corpus production

The production process of corpus should be controlled in a unified way. The input of corpus production has been unified, and the next step is how to convert the flow into the corpus conforming to the platform specifications, so as to platformize the whole conversion process. However, for Autonavi, each business has its own characteristics. If the platform provides customized processing logic for each business, the cost is huge, and the platform is not particularly familiar with each business, it is easy to make mistakes. There are also some common processing logic in the whole corpus processing process, so we must provide a solution that not only supports the customized needs of each business, but also meets the common processing logic of the platform. We ultimately chose Flink to do the entire flow processing logic.

Traffic diversion has been standardized. Business parties only need to view the format content of traffic, write Flink UDF(user-defined function), and deal with their own business customization requirements. The subsequent common language data storage and other logic can be completed through Flink sink plug-in. In this way, it can not only provide general processing logic, but also provide support for the special needs of the business, and has good scalability.

Intelligentization of corpus

It has been mentioned above that autonavi, a national travel application, is greatly affected by various environments. How to achieve accurate pressure control at the interface feature level is another big problem at that time. The platform has precise pressure control at the interface level. You only need to classify interfaces according to features to provide feature distribution of real traffic. However, the feature distribution of traffic changes in real time. How to provide the feature distribution in line with the peak of traffic is the ultimate goal of corpus intelligence.

There are three stages to realize the intelligentization of corpus. The first stage is traffic characteristic statistics. We need to clarify the factors that affect the change of the flow, which is reflected in the specific parameter distribution of the flow and which parameters will change with the change of the external environment. Of course, most of autonavi’s business lines have some rough analysis results, which can be directly used in the early stage, and more fine-grained feature analysis is required in the later stage.

The second stage is flow feature extraction. With specific characteristic parameters, it is necessary to extract the characteristic parameters for statistics, which can be used for intelligent prediction. But how should feature parameter extraction be done? After comprehensive analysis, it is found that it is most suitable for corpus production. Traffic diversion copy flow, corpus production process is used to deal with the flow, in this link to extract feature parameters is the best. The whole corpus production has good scalability, and the special needs of users can be completed through UDF, so the whole flow feature extraction can be just completed in the general logic.

The third stage is intelligent prediction and machine learning. With statistical data of characteristic parameters, and can use normal gold map National Day or the flow characteristics of the Spring Festival, and this year as the flow of business trends, intelligence to predict flow characteristics of the data conform to the National Day this year, or Spring Festival, do interface feature level precision pressure measurement, do pressure test in the true sense of the link, the escort for the stability of the gold map service. Subsequently, machine learning can also be used to automatically discover characteristic parameters that affect flow changes, and automatically collect and analyze them, so as to achieve real intelligentization of corpus.

The overall plan The whole drainage work will be completed by the unified drainage platform developed, and the drainage platform will cache the traffic to Kfaka through the drainage plug-in, and finally to ODPS. The whole corpus production service is directly connected to the drainage platform to handle the flow from ODPS.

The whole processing process of corpus production service is done by Flink. Users simply write Flink’s UDF to fulfill their own line of business customization requirements. In addition, the UDF of Flink supports multi-parameter transmission. Users can write UDF flexibly and transfer related parameters dynamically in the execution process to solve the problem of request expiration.

Flink Sink is a Flink source table parsing plug-in developed by the platform, which mainly includes feature analysis and extraction of traffic and writing the produced corpus into OSS according to the interface name for platform pressure measurement. The current traffic profile is provided by each line of business itself and added to the platform. Flink Sink invokes the platform’s open API during execution to obtain characteristic data for collection and finally reports to the platform. The platform then carries out machine learning based on these data to intelligently predict traffic characteristics in line with traffic peaks for full-link pressure measurement.

Core Functions

Iflow Traffic diversion platform

Based on the analysis of the above problems, autonavi Engineering Efficiency team actively rose to the challenge and developed Iflow drainage platform in just a few months to carry out unified management and control of autonavi drainage, as shown in the figure below:Iflow drainage platform manages autonavi’s drainage in the way of tasks. Currently, traffic is copied in the form of a traffic diversion plug-in (more traffic will be supported later). Traffic is cached in Kafka and finally written to ODPS for everyone to use. The user only needs to extract the required data from ODPS. The initiation of drainage requires the approval of the relevant person in charge, which is known to the related business, effectively reducing the cost of troubleshooting after the accident caused by drainage.

TestPG corpus is intelligent

The corpus intelligence of Autonavi full-link pressure measurement platform is mainly composed of three modules: business line management, pressure measurement list management and interface ratio management. Line of business management is mainly used to manage the relevant data of each link of Autonavi, including related drainage task, starting drainage, drainage record, corpus path, pressure measurement header management and triggering corpus production. A business line is a pressure measurement link, from drainage to corpus production and corpus feature analysis are all completed in the dimension of business line. The details are shown in the figure below:Function introduction:

  • Associating traffic diversion tasks: Associate traffic diversion tasks with the traffic diversion platform and configure related parameters.

  • Start the drainage task: start the drainage platform task, and the corpus production will be automatically triggered after the end of the drainage. The production of corpus and the extraction of characteristic parameters will be completed by executing the Flink UDF written by the user and the Flink plug-in developed by the platform.

  • Corpus path: the platform will automatically generate the corpus path after starting the drainage to trigger corpus production each time, and users can choose it independently when creating corpus.

  • Header management: Each service line has its own service characteristics and the header is displayed differently. The header is used to manage the header content sent by the HTTP service.

  • Triggering corpus production: There are two ways to trigger corpus production. One is to automatically trigger corpus production after the drainage task is well associated and the drainage is started, including a series of operations such as feature parameter extraction. Second, after successful drainage, users may modify UDF and other parameters, or trigger corpus production through this button.

Pressure list management is mainly used to manage the interface of pressure survey. A company began to do pressure testing, business is certainly need to follow to adapt, followed by business transformation, this is a long process. In order to facilitate management, autonavi full-link pressure measurement platform for unified management of autonavi side of the interface. The details are shown in the figure below:The pressure test list is automatically reported during traffic diversion. If an interface is found not on the pressure test list in traffic diversion, it is automatically reported to the pressure test platform. The platform associates the corresponding person based on related applications and promotes confirmation. If it can be compacted, it will be confirmed as the compacted list, and the next corpus production will be normally drained as the whitelist. If the pressure test can not be divided into pressure free interface or interface to be followed up. The interface platform to be followed up will promote the transformation of the service line in the form of message notification, and finally achieve the full-link pressure test with full-interface coverage and full-link coverage in the real sense.

Interface ratio management is mainly used to manage the interface ratio data provided by BI and adjusted for each full-link pressure test, which is close to the real situation, as a reference for the follow-up full-link pressure test. In the later stage, statistical data of traffic characteristics will be extracted through corpus production, and intelligent analysis will be conducted to predict the traffic proportion in line with the real situation, which will be directly used for full-link pressure measurement, as shown in the figure below:

Platform advantage

Platform production of corpus

The whole corpus production is connected to the drainage platform and completed by Flink. It not only supports the customization requirements of the business side, but also supports the processing logic of the platform generalization, with good scalability. The general logic is realized by Flink sink, and the functions such as flow feature extraction are added to promote the smooth progress of corpus intelligence. Users only need to learn Flink to write udFs and then complete relevant configuration on the platform. It improves the efficiency and quality of corpus production to a great extent and is a great leap from the standardization of corpus format to the standardization of production process.

Intelligentization of corpus

In the whole process of corpus production, the platform completed the statistical summary of characteristic parameters through the Flink plug-in. At present, users only need to complete the configuration of relevant features on the platform, and the platform will analyze and summarize the features in the process of corpus production. The statistical data of characteristic parameters will help the platform to follow up intelligent analysis and prediction, achieve precise pressure control at the interface characteristic level, and finally achieve the full significance of the full-link pressure measurement.

At present, the platform has completed the automatic production of corpus and added the work related to intelligentization of corpus. The entire pressure test list is automatically reported through traffic diversion. In the future, the service line will be automatically pulled through through message notification. The interface proportion management module has also supported the display and adjustment of interface proportion. Finally, through the intelligent prediction of the characteristics of the corpus, the corpus that conforms to the real characteristics of the traffic peak can be produced. All these will promote the intelligent evolution of Autonavi full-link pressure measurement.

future

The intelligentization of corpus of Autonia Full-link pressure measurement platform has been developed for a period of time. Through everyone’s unremitting efforts, the intelligentization of corpus has completed the automatic production of corpus, as well as the summary and extraction of characteristic parameters, laying a foundation for the subsequent intelligentization. The future platform will analysis by means of machine learning to study the characteristics of the data collected, according to the characteristics of the usual traffic peak condition, and the change tendency of the traffic this year to predict fits the characteristics of traffic peak of the year, do interface feature level precision pressure control, completely simulate the real traffic pressure measure to achieve the real meaning of the whole link.

In addition, the platform will use machine learning to automatically analyze and discover parameters that affect flow changes, and automatically extract and analyze them to improve the accuracy of corpus production.

The platform will also have a confidence evaluation system to compare the real flow characteristics and predicted flow characteristics respectively, analyze the causes of errors, further improve the accuracy of prediction, and achieve completely real flow production. Subsequently, with the precise pressure measurement, pressure model and monitoring functions of the platform, the true meaning of unmanned and intelligent full-link pressure measurement can be achieved.