Respect for experimental data and experimental culture is to respect the product itself.
High income often means high stress, and that’s the case with programmers. When the whole world envies the programmers who earn more than 5 million a month, maybe they will be singing in the middle of the night:
You will never understand my sorrow, daytime do not understand the black night
The iOS version of Sina Weibo’s international version of v2.8.0 has two updates: one is for iPhone X adaptation, and the other is to solve some bugs in the development and the developer of the Bug.
This and next door home “fixed the flash back Bug, but also killed a programmer worship heaven” is the same! This kind of expression tends to make for a good story in Internet circles. Therefore, in the xiami Music “poor VIP” incident, the netizens are also self-ridicule, and no one is really angry – after all, poor is a kind of temperament, not to buy VIP.
Clucked, however, after careful analysis of the whole event, found some problems: the cause of the event is swing revision of product department needs to develop the three days two head over to, they only then express their grievances in the log (temporarily not discussing the behavior itself), and this phenomenon reflects a profound question: The product, operation and technology cannot convince each other, and the final product direction is like “Schrodinger’s cat”, who is right and who is wrong depends on the will of God ~
Ah, this is fate ~
In the activity of “Super Hero in the Internet Era” organized by Holling-Jun, Yang Guoguo, the head of Modao growth, explained to us the ownership of the discourse power of companies with different business models: companies that favor technology, technicians control the discourse power; It’s normal for product-first companies, product managers to have a higher voice, and for operations-heavy companies, operators to be proud. As a technology and Internet company with a halo on its head, it relies on “rule by man” and “palace fights” to maintain product iteration. Should this be a style or a retrogression?
Differences of discourse power between departments under different business models
Data has penetrated into every industry and business function today, becoming an important factor in production. People’s mining and application of massive data indicates the arrival of a new wave of productivity growth and consumer surplus — McKinsey, a world-renowned management consulting company (the first to put forward the concept of “big data”)
In such an era, data mining and application should not only stay at the level of investment, but also change in thinking and mode. The only way to resolve the deep contradictions between product, operation and technology is data, and data will speak for itself. However, data as a standard only solves the first half of the problem, how to make data a priori rather than “afterthought”?
“The answer is A/B test.”
There are so many famous laws in the Internet industry, why is the answer A/B testing? A/B testing may be older than those laws — it’s not A native product of the Internet age. A/B test was from the medical industry and was A Randomized Controlled Trial (RCT). Previously, they have been widely used by the FDA, drug management or medical management units. Based on the experimental results of A/B testing, they are the highest standard of testing and verification programs in the industry.
While you may not be interested in “what is RCTS and why the medical profession uses A/B testing”, there is A more convincing way to explain A/B testing — using statistical theory to show why A/B testing is more scientific, efficient and stable than theorems, experience and other tools.
This time, we selected the relevant data provided by the cloud platform Appadhoc A/B Testing. We know that A/B Testing is essentially A comparison test, and its working principle is to statistics the data of two samples (sample number, sample mean and variance, etc.) of the control version and the test version. A statistical formula based on normal distribution is used to calculate whether the population parameter (mean) of the test version has a deterministic improvement over the population parameter of the control version.
From the principle of A/B testing, A/B testing is A hypothesis test (significance test). There are two hypotheses in the experiment — the original hypothesis and the alternative hypothesis. The null hypothesis is the hypothesis that we hope to disprove with our experimental results, and here you can simplify it to the original version; The alternative hypothesis is the hypothesis that we want to test with experimental results, and here you can simplify it to the experimental version.
The null hypothesis and the alternative hypothesis are a complete set of events and are opposed to each other. In a hypothesis test, either the original hypothesis or the alternative hypothesis must be true, and if one fails, the other must be accepted unconditionally. Here you need to understand the simple understanding of the original version and the experimental version, except for the index to be optimized, other conditions are completely the same, statistically based on the idea of small probability.
Experiment, never avoid “error”, can also say that we are “trial and error”. However, in order to obtain scientific results, we need to minimize the probability that these results may lead to misjudgment. There are two types of errors involved: truth-discarding errors and pseudo-discarding errors.
Truth-discarding error is the event that rejects the null hypothesis if it is true. The probability that we allow such an event to occur (denoted alpha) is called the confidence level of the experiment, and 1- α is called the confidence level of the experiment. However, the confidence level is an artificially set value that needs to be verified by calculation, and the calculated value is denoted as p-value. Only if there is no difference between the two versions can P be used as the probability of the trial data.
The shaded part is the p value
It can be deduced that when P ≤α, it means that the test has obtained statistically significant results. When P is smaller, it can support the judgment that events with small probability will not occur, thus overturning the original hypothesis and accepting the alternative hypothesis.
P-value calculation involves sample mean, sample size, and standard deviation.
Taking false error means accepting the null hypothesis when the null hypothesis is false. Intuitive but not rigorous understanding is that the original version still sticks to the original version even though it is clearly behind the test version in core indicators. The probability of this error is also called taking false error in statistical perspective, denoting as β, and the probability can be relatively higher. The standard is 10% and 20%.
As with significance level, in order to effectively avoid the occurrence of false error, we need to calculate another parameter for our reference through accounting β — statistical power, similar to the confidence of accounting confidence interval, it is the idea of 1-β to obtain (statistical power = 1-β).
Statistical power: refers to the probability that the version difference (effect) can be correctly detected by significance test when the version difference (effect) is a specified value. In short, it is the probability that we can correctly reject the null hypothesis and obtain a statistically significant result (the data in the 95% confidence interval).
Statistical power calculation involves sample size, variance, α, and minimum variation or confidence interval lower bound.
It can be seen that only when true error rejection is controlled within 5% and false error rejection is controlled within 10%-20% can the experimental data have reference value. In other words, when we do A/B testing, the results are 95 percent confidence and 80 to 90 percent statistical power, it is meaningful to us and can be used as A reference for decision making.
What’s possible if you do it right
A/B testing is not A simple and crude parallel testing of multiple test versions, but is based on the clear product thinking and clear optimization requirements of the product team, on the basis of the continuous improvement of the core algorithm of A/B testing platform, and on the basis of rigorous logic and statistical principles. If each iteration of A product is A big exam, then A/B testing is the teacher who helps you bet on the problem, and you always give your target audience A good answer ahead of time while the rest of the class is wasting the opportunity to choose between three short and one long and fail to get A C. By rejecting A/B testing, you are rejecting the most correct path to success.
Bottom line: Products don’t work, you need a meso perspective
It is often said that A top product manager can only beat half of A/B tests. This is not to deny the great role of experience, drive and insight in product development. There needs to be A meso perspective to balance and coordinate between micro-optimization and macro-conception of products — A/B testing is such A meso tool, and testing is A meso perspective suitable for Internet product optimization. No one individual or team should have a say in a company, but trial data can — and respecting trial data and culture is respecting the product itself.