As a small boss, it is common to read data analysis reports from various teams that seem to be based on reasoning and factual eloquence, but can be deliberate or unintentional sophistry. Make me often like a silly sweet beautiful girl in the face of the pursuit of young men, in the face of these rigorous data analysis also have to be more careful.
1. Misleading visualization
As a rule, it is easy to be awe-inspiring when you draw a chart, at least in the right way, but it is easy to be crafty.
In order to express the increasing urbanization rate in China and the supporting effect of the trend of family miniaturization on housing price, the author in the following figure lays out two bar charts. However, in order to express the strong trend of these two indicators, the Y-axis does not start from 0, so it is more visually impressive but misleading. (But being misled is just, the first and second line of this trend, buying a few years earlier is not a bad thing)
2, use the absolute value of proof or unreliable
The old adage “everything grows in the tree” is especially applicable to data analytics that use hard evidence or unreliable absolute values to prove their worth, which is often the case in large companies where product managers tend to rely on heavy traffic to try out new features.
For example, the product features of a recent product launch community were questioned as having nothing to do with the main direction. The product manager immediately countered that n users of the feature had already found jobs (job hunting is one of the product’s core features). However, every day tens of millions of users in the product around, do anything is not unusual, cite the unique evidence interesting? At this time think of zhihu famous: divorced from the dose, talk about food toxicity, are playing rascals.
So to impress, consider replacing the old saying “Big tits are everything” with “big numbers are everything.”
3, reasoning logic chaos
A lot of data analysis, although with full data, but the logical reasoning is extremely chaotic.
Some time ago, I met a product that launched a new function X at the first-level entrance, but it might conflict with the original function Y at the second-level entrance. X robbed the users of Y.
As we all know, big companies do products, what often happens is left and right, the user like driving ducks around; But there’s always someone who gets a credit for the data bump. The most obvious example of this is that many companies are praised for the increase in users of small apps, but the main App numbers are down.
Voiceover: So what are the 40% of users who use X instead of Y doing? Does it count as the loss of function Y?
4, can not help up the wall of the small-scale test
Product managers often use small-scale testing or even AB testing to observe new features for the sake of rigor. However, a strange phenomenon is that a feature that works well on a small scale is not satisfactory on a full scale. This can often be due to sampling bias, which product managers can easily create, intentionally or unintentionally, when sampling because of their competitive nature. Two common biases are the survivor bias and Simpson bias.
Survivor bias. Some time ago, our company held a user Open Day and conducted a Focus Group interview with the invited users. Results In the interview, the users were extremely satisfied with our products, much beyond our expectation. Apart from the fact that the users remain gentlemanly or gentlemanly and dare not fight against us in person, the main reason is that the invitation is promoted through our App, and the users who come are not the ones who have been broken by us, so they have a lot of good feelings. If you test new features on a small scale based on these users, you’re bound to be biased.
Simpson bias. Below is an AB test of a kidney stone treatment plan taken from the Internet. In A single case, plan A was superior to Plan B. Overall, however, the conclusions are reversed. The surprising conclusion was mainly derived from the difference of samples: the proportions of large and small stone cases in A and B were significantly different, forming two completely different samples, resulting in such A reversal of results.
5. Mismatched causality
It is said that the most difficult relationship in the world to prove is causality, and it is also easier to get into trouble by mistake than ambiguity. Here’s a good example of a mistake you’ve made.
In the face of the cold question and answer (Baidu index poetry this word why in November 20 every year when there is a rapid increase in the trend, these days is there any important day? – Baidu Index – Zhihu), smug enough to use the relevant function of Baidu Index to find a strong correlation between “poetry” and “Thanksgiving Day”, and then find various reasons to believe that there is a causal relationship between them.
Although always feel there is something wrong, in order to cheat praise or can not help Posting. The result was hit in the face, more likely cause and effect is: the sixth grade pupils tortured by the Chinese textbook, catch up with the comprehensive learning activity “tapping the door of poetry”, the progress of the textbook is about this time point, so a large number of pupils search “poetry” to complete the homework.
There are many cases of mismatching causality in life, and we can pay attention to collect them. But one funny website (15 Insane Things That Correlate With Each Other) has done just That to be funny, listing examples of how Correlate looks logically related but how causality is wrong. Send out to share with everyone, so that the writing is self-deprecating.
(1) Nicolas Cage’s film appearances are highly correlated with the number of people who drown in swimming pools.
. For more answers please seeHome page of Hemingke
. More articles are availableData iceberg – Zhihu column