Data index fluctuation analysis

Author: Xianyu Technology — Hui-Ning

background

It has been one of the daily work contents of Internet people to open the data monitoring kanban and observe the business KPI of the day. The fluctuation of data indicators is normal for a business with unstable period. The essence of fluctuation analysis is to express the laws and problems behind fluctuation through statistics and comparison. Business data indicators have fixed fluctuation cycles, and the change of data in each cycle should tend to be stable, but in the daily, weekly and monthly reports of the data monitoring system, a data indicator suddenly does not conform to the expected stable change, which is what we call abnormal data fluctuation. This paper will introduce to you from the following ideas:

Data accuracy

In the face of fluctuation, the accuracy of data should be judged first. Accuracy is the basis of data-guided decision-making. Daily problems such as server anomalies and upstream dependency table changes are unavoidable, resulting in outliers on the data monitoring platform. There are also different ways to check the accuracy of indicators:

• Absolute value indicator: In the case of the data intermediate layer or detail layer, it is usually traced to whether the log report is abnormal, whether the report logic changes, or whether the end interface is stable. If the data belongs to the application layer, check whether upstream dependencies have changed or data diameters are not aligned. • Rate value index: the following matrix, separate investigation according to the numerator and denominator, refer to absolute value index.

Data anomaly

Then, on the basis of accuracy, the amplitude of data fluctuation is judged and whether the amplitude is abnormal. The measurement of volatility is usually based on year-on-year and month-on-month. Periodicity is an important factor affecting data volatility. Week-to-week, month-to-month, year-to-year and day-to-month can remove the periodicity and see the nature of data volatility. The most commonly used method to judge whether the fluctuation is abnormal is the 3-sigma principle. According to statistics, in a normal distribution, 99.7% of the data falls in the range of 3-standard deviation, 95% falls in the range of 2-standard deviation, and 68% falls in the range of 1-standard deviation. However, in specific businesses, the data trend and experience are also used to judge.

Attribution analysis

After accuracy and anomaly problems are determined, attribution analysis and impact measurement are carried out, and the results of analysis are ultimately used to drive business decisions. For most of the data fluctuations that have occurred, they can probably be classified in two dimensions:

• Scope dimension: including its own products, rival aspects and the overall environment of product business. The reason for this classification is that competing products are all in the big product business environment, and the change of either party will cause the data change of its own products; • Content dimension: including product, technology, user and operation, these dimensions basically cover the important components of Internet products. Daily fluctuations are mostly caused by their own modules. When specific attribution, we can drill down from the perspective of people and goods yard: the technical dimension is usually the problem of data accuracy, the product and operation dimension affects the field, and the user factor is broken down into buyers, sellers and suppliers.

External reasons are usually difficult to verify, because it is difficult for us to obtain the data of competitors, and the so-called third-party reports are not accurate. If the competitors are listed companies, we can check the financial data disclosed, but for the fluctuation of a certain moment or period of time, insight into the financial statements of competitors is often delayed. Discarding external factors, attribution from internal factors can be disassembled into the attribution of absolute value index and rate value index.

• For absolute value index, attribution methods are summarized as horizontal analysis, vertical analysis and cross analysis.

• Lateral analysis: The human cargo yard model is a typical lateral analysis, with MECE as the split principle. Disassemble from the human point of view, the buyer user is traffic, that is, there are visiting users, can be disassemble from the basic portrait information, source channels, activity, purchase level, purchase intention and other angles, determine whether there are obvious fluctuations of users at all levels; Disassemble from the perspective of goods, judge whether there is any change in new delivery, inventory and distribution side; The field reflects the strategic impact of a product or operation.

• Vertical analysis: Funnel model is a typical vertical analysis. It is a set of flow-based wave analysis method, which can scientifically reflect user behavior state, user conversion rate and miss rate at each stage from the starting point to the end point.

• Crossover analysis: Crossover analysis is the intersection of horizontal analysis and vertical analysis, and fluctuation attribution in business is often the continuous crossover of horizontal and vertical.

Take order fluctuation as an example for horizontal analysis: People: order quantity = natural + external investment +push+ two parties + SMS = low activity + low medium activity + medium activity + medium high activity + high activity = purchasing power L1+L2+L3+L4+L5 =……. Goods: Order quantity = new orders + old orders = women’s wear + 3C + mobile phone + card coupons +… = the sum of different sellers value order field: orders = search + + guess you like + city longitudinal analysis: click orders = dau exposure conversion to light conversion rate enquiry to cross analysis: click on the conversion rate enquiry to pay conversion orders = (natural + investment + push + 2 party + SMS) * (each group under the funnel conversion)

• For the rate value index, finally converted into a linear combination of absolute value index.

$\\Delta \\frac{M}{N}=\\frac{M1}{N1}-\\frac{M0}{N0}=\\frac{M1N0-M0N1}{N1N0}=\\frac{M1N0-M1N1+M1N1-M0N1}{N1N0}=\\frac{M1(N0-N1)}{N 1N0}+\\frac{M1-M0}{N0}$

Impact measurement

After determining the influencing factors, it is to measure the degree of influence. If the impact indicator is single, the impact degree is obvious. However, when multiple factors work simultaneously and no AB experiment is conducted, the impact degree of each factor cannot be unified. In a specific business, there are various factors such as industry, promotion, festival, promotion, promotion, etc., and the AB experiment has not been carried out completely. In this case, it is difficult to unify the impact level. The common methods are as follows:

• Control variable method

This calculation method is to select multiple time dimensions, compare them with a fixed date, and then superimpose the influencing factors. Taking per capita IPV as an example, the influencing factors are activity POP, operation pit location and external investment. Assuming that the daily influence is 0, the above factors are the influence increment. Then, the equation set is solved to obtain the influence weight of each factor.

• Pit analysis

Pit analysis is mainly used to analyze the impact of pit operation activities on feeds. Using the principle of exclusion, focus on the impact degree of operation pit, and then the other factors are 1-impact degree. Take per capita IPV as an example: per capita IPV of pit = exposure PV ×PCTR➗ daily exposure UV= click PV of pit/daily exposure UV, contribution rate of pit = change value of pit/total change value *100%.

• Prior judgment method

Prior judgment is evaluated according to the effect of previous activities, and the effect of previous activities is taken as the impact degree of this activity.

• Marginal effect attribution

If the strategy has an impact on KPI, increasing or decreasing the intensity of the strategy can see corresponding changes in the observed indicators. The cost of this method is time. Usually, it is necessary to increase or decrease the strength for a long time to judge the increase or decrease of indicators, but pay attention to the comparison at the same time to avoid periodic influence.

• Dual difference method

The idea of constructing a “control group” requires the two groups to meet the condition of “common trend hypothesis”, that is, when affected by external influence, the strategy group and the control group have the same change trend. Strategy group is affected by strategy and other factors: strategy effect = A1 — B1; control group is affected by other factors: change value = A2-B2, then strategy effect =(A1 — B1)-(A2-B2), but the disadvantage is that “control group” and experimental group samples should be as similar as possible.

conclusion

This paper is a summary of the analysis methods and ideas of idle fish business fluctuation, which mainly focuses on data fluctuation judgment, fluctuation attribution and impact measurement. However, fluctuations of a certain day often require analysts to make quick decisions, which requires us to be familiar with business changes, timely focus on products, operations, algorithms, development and other students, and quickly locate problems. The above dimensions can be used as the disassembly dimensions of automated attribution.

background

Data accuracy

Data anomaly

Attribution analysis

Impact measurement

conclusion

Related Posts

One method eliminates the nSum problem

Principles and usage of Redux, React-Redux and Middleware

Use CSS print style for layout design