Takeaway: In all links of the model test process, the model of unified definition into a general consistency problem, the consistency problem is the basis for the stability of the model, from be born to a specific point dimensions can be divided into data inconsistency, divided into time delay is not consistent, inconsistent policy mechanism, performance, a few direction, such as inconsistent on the consequences derived will lead to model stability index jitter, The predicted results do not meet the expectations, so conformance testing plays a very important role in the current testing process of large-scale machine learning, but it is difficult to achieve a global solution in the test scheme for the model, which also involves the compromise between effectiveness and efficiency. In terms of consistency, the strategy and effect of core nodes should be guaranteed, and the occurrence of stop-change and rollback should be guaranteed to the greatest extent. At present, the consistency scheme achieves good benefits in baidu business, and a lot of optimization strategies have been made in effect and efficiency, so that the overall scheme can meet the needs of business lines.

The full text contains 6948 words and takes an estimated reading time of 12 minutes.

I. Background and overview

Click-through rate model (CTR) plays an important role in advertising sorting and truncation.

The CTR model is divided into two parts: online prediction and offline training. The offline training mainly conducts the training and evaluation of the model, while the online prediction mainly focuses on the application and feedback of the model. Specific closed-loop link as shown in figure 1, the offline model online click and display advertising data to view the log, after anti cheat system and flow feature extraction to get samples on the model of training, training good model application to online advertising roughing after evaluation and selection stage hits forecast, forecast of hit q value and sorting, truncation effect on advertising, Thus forming a complete processing link.

△ Figure 1: Closed loop in off-line model

But because of the complexity of system including the characteristics of the engineering, such as training, storage precision loss, resulting in policy iteration often appear in effect, offline model, for example, the new strategy in the off-line training stage model, this model in the assessment data performance is excellent, but in the online advertising effectiveness index in the system such as clicks, Thousand-click display value and so on is not satisfactory.

As shown in Figure 2, advertising online prediction needs to go through the following steps: sample prediction → online feature set → discrete embedding vector query → discrete embedding vector aggregation → DNN network calculation → Q value prediction, which also needs to go through in offline training: Training sample → offline feature set → discrete model query → embedding vector aggregation → DNN network calculation → Q/ model. If there is a difference in any process of Q value calculation, the prediction results will not match the results of model training, which will affect the online effect of the model, and then affect the effect indicators of advertising.

△ Figure 2: Complete processing process of the model offline

2. Definition and objective of consistency assurance

2.1 Definition of the consistency problem

Machine learning can be divided into supervised learning/unsupervised learning according to whether there is a mark. The problem of advertising click-through rate estimation belongs to the sorting problem in supervised learning, which is also a common kind of learning business in the Internet. The basic steps of the iterative process of classification learning are shown in the figure below. Each box represents a data form, the legend gives the meaning of each literal, and its subscripts represent the upstream source and version of the data.

_△ Figure 3: Classification learning mechanism _

Step 1 shows the training process: the training set produces the model through feature extraction and training process. In the evaluation stage of Step 2, the model of training output is used and the samples not entered into the training set are input for prediction, so as to evaluate the effect of the model. When the evaluated model index meets the expectation, the model will be applied to the online prediction (Step 3), and the estimated value will be applied to the specific problem (Step 4). After the application, a new training set will be generated, which will return to Step 1.

Model inconsistency may occur in any processing step of offline training, offline model evaluation, online prediction, feature extraction and prediction application in the iterative process of classification learning described in 2.1, which mainly includes data consistency and processing logic consistency.

Data consistency includes sample consistency and model consistency. The sample consistency mentioned here is logical. It does not mean that the exact same sample data is used for training, evaluation and prediction, but that the same input data set has the same value after processing in these stages. Model refers to the consistency of the off-line training model and the output of the online forecast using precision of model transformations within the expected range, this also refers to logically consistent, because after offline training model to forecast the system load passes through offline discrete model – intermediate state compressed data – online loading state of a discrete model of the three, After the filtering of some features and the conversion of accuracy, there will be expected losses.

As shown in Figure 4 below:

△ Figure 4: Data consistency

After keeping the data consistent, it is also important to keep the processing logic consistent. As circled in Figure 5, the logical consistency includes the logical consistency of feature extraction in steps 1-3 and the use of logical matching in the evaluation and prediction of the output model.

△ Figure 5: Logical consistency

The logical consistency of feature extraction refers to the consistency of feature signatures produced after the same input data passes through the feature extraction database. The logic consistency of the prediction and evaluation of the output model is mainly to ensure the effect of the model by ensuring the logic consistency of the model. The general hypothesis function in classification learning is:

y = H (w.x)

Where y is the learning target output, x is the feature vector, w is the weight vector of the feature, H is the calculation function between the feature and the target value, and different business models are corresponding to different solving processes of H. The training process of the model is the process of obtaining W by minimizing the error function given y and X. Both evaluation and prediction can be regarded as the process of solving and calculating y when input X and model W are known. Therefore, ensuring the consistency of calculation logic is an important measure to ensure the application effect of the model.

It is worth mentioning that data consistency and logical consistency also affect each other. Different processing logic in any stage will lead to different input data in the next stage. Processing logical matching, the input data is different and the final input Q value is not consistent. Therefore, ensuring consistency of model data and logical processing is a key step in ensuring healthy model iteration.

2.2 Objectives for consistency issues

According to the definition of the consistency problem, three objectives for the implementation of the consistency work can be identified:

  1. Verify whether the inconsistency exists to judge whether the iteration update of the current model system is healthy;

  2. Locating the causes of inconsistencies and trying to solve the key steps of inconsistencies;

  3. Assess the impact of inconsistencies on the system, the cost of repairing the inconsistencies and the basis for obtaining benefits.

How to verify each of these goals? The supplementary check stream can be used.

As shown in Figure 6 below, when Qp1 predicted by the prediction set P1 acts on the strategy and generates a new training set T2, we can use T2 to replace E1 in the evaluation step as the input of the supplementary flow:

△ Figure 6: Supplementary verification flow

The prediction process can represent the same data and logic flow online and below, and the additional verification flow can represent the data and logic flow of the offline system, and since the inputs for the prediction and verification are unified (T2 theory is a subset of P1, where P1 is all the estimates, and T2 is the data after the presentation click). By matching the target verification with the actual processing stage, the verification content of the model consistency problem is transformed into:

  1. Whether the content of offline samples T2 and P1 is consistent;

  2. Whether Fp1 and Ft2 are consistent in off-line sample feature extraction results;

  3. Whether Mp1 and Mt2 models used in offline samples are consistent;

  4. If the same model is loaded in offline samples, the final output results Qp1 and Qt2 are estimated to be consistent.

Iii. Technical solutions

3.1 Full-link Consistency Scheme

The calculation of click-through rate Q is divided into several steps. Take one q as an example, and its processing steps are shown in Figure 7 below. It mainly includes multiple operators such as parameter analysis, feature extraction, discrete embedding vector query, DNN network calculation, hidden layer information writing, result filling and so on. The execution of the latter operator depends on the output of the previous operator.

△ Figure 7: Estimated processing steps

We consider the whole process of estimation, cover every link of estimation by means of normalized variable and gradual replacement, compare whether there is inconsistency, and find out the location and reason of inconsistency. Since the magnitude of discrete model data is at TB level, it is not feasible to directly compare data diff. We also indirectly assess whether the models used are consistent by comparing the output Q values. First, the specific meaning of each part in figure 8 is explained:

  • Online_fea line: input the estimated sample after feature extraction;

  • Offline_ins: Offline samples for training;

  • Offline_table: a discrete model stored after offline training, which stores the historical point spread vectors corresponding to different features;

  • Mid_compress: the data after some features are filtered out by the discrete large model according to the specific threshold value. After training, the data is stored in the same path with the S discrete large model.

  • Online_table: Online estimation of the actual discrete model used, mid_PB filtered by compressed script and loaded to the online;

  • Cvm_fea: queries the network input data after the aggregation of the large model.

  • Online_calc_dnn: Load small model online for DNN calculation;

  • Offline_calc_dnn: The offline training loading model is used to calculate the DNN forward.

△ Figure 8: Calculation and verification flow

Considering the three parts of feature extraction, large model query and Dnn calculation, a total of five streams are calculated and verified.

The first stream is the original estimate of the original online q1, the second flow input data is the corresponding offline and online estimate phases are sample, load forecast online offline when the corresponding large and small model, calculate the q2, the third line of the flow input discrete characteristics of forecasts, load and online estimate when offline corresponding large and small model, calculate the q3, The fourth stream input is the discrete characteristic of prediction, and the discrete large model converted from MID_PB is loaded to calculate Q4. The fifth stream number is input into the aggregated network input data after online prediction, and the fifth stream number is loaded to the small model of online prediction, and the calculation is q5.

The comparison between Q2 and Q3 ensures that the logic of feature extraction is consistent with the output data, and the q3 → Q4 → Q5 comparison ensures that the discrete model used in calculation is consistent. The comparison between Q5 and Q1 ensures that DNN calculation is consistent. By comparing Q1 → Q5, we can analyze whether there is inconsistency between the in-line and off-line systems and where the inconsistency occurs.

Key question 1: Whether there is inconsistency

If DIff exists in q1 and Q2 and diFF is not within the expected range, it indicates that diff exists in and off model. This is because the inputs of Q1 and Q2 are splices in the same time period to correspond to the determined samples. The inputs are the same but the final output Q values are different, indicating that diFF does exist in and off model estimation. What needs to be evaluated is whether the amount of DIFF is within the acceptable range. If the poor online effect caused by the current DIFF cannot be tolerated, the position and cause of DIFF should be further analyzed and determined.

Key issue 2: Where inconsistencies occur

By comparing the output Q value step by step, the position where the inconsistency occurs can be determined, as shown in Figure 9 below.

△ Figure 9: Q value comparison process

  • Q2 and Q3 are inconsistent: Q2 and Q3 only have input-ins inconsistent. Inconsistent output means inconsistent results in off-line feature extraction.

  • Q3 and Q4 are inconsistent: q3 and Q4 have the same input, but one uses the original offline table when querying the large model, and the other uses the offline table after MID_PB conversion. Inconsistent output indicates that there are problems in the conversion of the offline Table and mid PB.

  • Q4 and Q5 are inconsistent: the inputs of Q4 and Q5 are the same, and the large model uses MID_PB and online table respectively. Inconsistent output indicates that there are problems in the conversion between MID_PB and online table.

  • Q5 is inconsistent with Q1: Q5 and Q1 are consistent in input, query large model and network input data after aggregation. One DNN network calculation is online and the other is offline. Inconsistent output means diff exists in in and offline DNN calculation of the model.

3.2 Key Steps

3.2.1 Obtaining Online Data

The online data needs to be collected from the prediction module, that is, synchronize the online prediction module to the test environment, copy the online traffic, enable the debugging function, and export all the data in the prediction procedure, including the feature extraction result, aggregated network input data, and true estimated Q value. In general, the estimation module uses multithreading to process the estimation request, and the typical debug log print format is shown in Figure 10 below:

△ Figure 10: Debug log format

In Figure 9, a-e is the serial number of samples in different information. It can be seen that the features of debug logs are as follows:

1) Alternate printing between multiple threads;

2) Print in order within each thread;

3) The complete information of the sample is cross-line;

4) Complete sample processing information may span files, while the logs that can be compared are sequential.

Therefore, before the replacement and calculation, you need to parse and format the debug logs collected online. Because the amount of log data to be processed is large, the single-node execution takes a long time. In this case, you can adopt the real-time processing strategy of traffic diversion and parsing on the edge, as shown in Figure 11 below, to improve the data processing efficiency.

△ Figure 11: Debug log processing

3.2.2 Splicing Data Offline

After the online data is obtained, sample data corresponding to the same period should be found. The relationship of offline data set is shown in Figure 12 below.

△ Figure 12: Relationships in offline data sets

For the estimated request samples, some of them will be filtered out by the policy in the sample stage, and there may be some samples in the training samples that have not been estimated online. Therefore, the effective data for offline comparison is the collection of the two samples, which needs to be spliced. For offline data used for training, for the convenience of storage, query and generally USES a unique sample line number field, but the online section does not exist, so we can not together, through the field need to look for the other fields for Mosaic, joining together the field known as the “primary key”, for the primary key of the choice, you need to follow the following principles:

1) Included in the offline sample;

2) Can uniquely identify a sample;

3) It is less likely to produce inconsistencies;

4) The ID class is usually selected instead of the text class. After the appropriate “primary key” is selected, the corresponding samples in and offline can be uniquely determined. This sample is the original input for the overall Q comparison and calculation.

3.2.3 Estimating Output Offline

Such as 2.1 is introduced, the process of training with known input and output x y, the process of solving parameter w, the depth of the neural network in training, the input x model parameters are calculated to obtain the output of the network, the network output and the actual y to calculate the error of calculation error of model parameter gradient, update back propagation model parameters, eventually making system minimum error. The structure of Dnn network is shown in the figure below. It is a FOUR-layer Dnn network: one layer of input, one layer of output, one layer of normalized layer, and one layer of hidden layer in the middle. The overall structure is shown in Figure 13 below:

_△ Figure 13: Structure of DNN network _

The training of the model is carried out in non-stage, and the DNN network parameters and discrete feature information are updated respectively. In offline training, if you want to output q value consistent with online prediction logic, you need to enable test mode, that is, only network forward propagation without gradient return. At the same time, in order to get the final Q value, the output of DNN network after sigmoid function should be printed out, as shown in Figure 14 below.

△ Figure 14 DNN network output

3.2.4 Q comparison and result presentation

From the analysis of the whole process of the model, the test report needs to show Q values from multiple perspectives, covering both the statistical perspective and the detailed perspective. Statistical analysis shows the overall distribution and variation of the data, which better reflects the verification of the consistency goal of whether there is a problem and the severity of the problem; Detailed analysis is to show more clues about DIFF, which can better assist in identifying the source of DIFF.

  • Q Value DIFF Statistical report

1) The absolute value and proportion of diff (measured diFF range) of Q value in off-line according to different precision intervals;

2) Q distribution map (visually showing Q distribution deviation);

  • Q Diff Details

1) THE Q value of the sample at each stage;

2) The sample splicing primary key (convenient for further manual investigation)

3.3 Complete implementation (overall series, test, use)

3.3.1 Series execution

According to the key steps described in 3.2, the execution of the end-to-end consistency model is divided into six phases:

1) Traffic diversion: Obtain the environment for online prediction, direct traffic to the environment in real time, enable debug logs, and obtain intermediate processing results of each phase in the prediction, including online features and aggregated network input.

2) Log formatting: The debug logs in the diversion phase are parsed and formatted, that is, the complete processing process of an INS is divided into different sub-file logs according to the thread ID;

3) Log splicing: The traffic diversion logs of the online prediction module and the offline INS in the corresponding period of traffic diversion are splicing and filtering according to the selected primary key. The unique INS corresponding to the offline ins are determined and used to calculate the Q value in the subsequent stage.

4) Online parsing: The network input data aggregated by specific INS can be obtained from the online debug logs according to the master key.

5) Q value replacement and calculation: According to the design scheme of full link consistency, cover every process of feature extraction, large model query and DNN calculation in Q value estimation, and calculate Q2 to Q5;

6) Report presentation: Show the q1-Q5 calculated in step 1-5, compare the differences between Q values, and locate and refine the diff causes to features, large model and transformation or network computing.

The specific implementation of each stage is shown in Figure 15 below:

△ Figure 15: End-to-end consistency complete execution steps

3.3.2 Task Testing

The above processing process is platformization to support self-help task testing and task viewing. According to the parameters of testing, drainage is carried out to obtain online Q value, INS, discrete vector and other information, which is used to replace the INS/discrete model/DNN calculation logic of offline training to calculate Q value and produce the final report.

The parameters to be filled in the test page are as follows, which are mainly used for input data processing and Q value calculation, as shown in Figure 16 below:

△ Figure 16: Test the input parameters

In order to improve processing and troubleshooting efficiency, we split the coupling of each step and selectively perform specific steps of the task according to troubleshooting requirements, as shown in Figure 17 below. If a new fault is added, complete the following steps are required: Traffic diversion/log splitting → log formatting → log splicing → online parsing → Replacing estimated Q value → COMPUTING Q value → Q distribution report → log comparison → feature consistency report → complete report.

If traffic diversion logs already exist, skip online traffic diversion and directly perform the following steps (2-10). If traffic diversion logs have been formatted, you can skip online traffic diversion and log formatting, and so on.

Worth mentioning is that due to the offline ins is overall screening raw input and produce diff the possibility of more, is always the default conformance report output characteristics, if, after analyzed the q value distribution report found exactly the characteristics or within the expected range, can be set not conformance report output characteristics, enhance the task execution efficiency.

△ Figure 17: Task execution setting and division

3.3.3 Example of Viewing and Reporting Tasks

There are two viewing tasks:

  • Page viewing mode: For each stage of the execution process, you can query the running status of the task in each stage according to the testing time and tester. The default status is Pending. If the task fails, the corresponding phase will become Failed, and the status will become Succeed.

  • Report receiving mode: For each phase of end-to-end consistency, the function of sending emails to the task leader and tester when a task fails is added. If not found, the task is still running or successfully running. In the email, the possible causes of failure are listed and the interface person corresponding to the task failure is marked. If the self-troubleshooting encounters problems, the interface person indicated in the HI email can help troubleshoot.

The output report example is shown in Figure 18 below, including Q precision DIff and Q value distribution. Diff position can be obtained according to the comparison of different stages.

_△ Figure 18: Example end-to-end consistency report

Iv. Effects and follow-up

  • Q using the same online prediction and offline training architecture can support troubleshooting, with strong scalability, and currently can support PC-related click-through rate Q;

  • Q investigation with more iterations supports platform-based execution. When the policy effect does not meet expectations, self-testing and investigation can be carried out on the platform. The end-to-end consistency tool currently supports the troubleshooting of multiple model policy effects that do not meet expectations. It finds multiple inconsistencies, including inconsistent features and unaligned network structures, and promotes solutions and fixes.

———- END ———-

Baidu said Geek

Baidu official technology public number online!

Technical dry goods · Industry information · online salon · Industry conference

Recruitment information · Internal promotion information · Technical books · Around Baidu

Welcome your attention