Have you ever encountered the following scenarios?
Boss: “the report that you submit, how and I check of different?”
Sales department“ERP background display into a single 687 pens, how do you tell me into a single 620 pens?”
Operation: “Why is the conversion rate you gave me lower than the actual conversion rate?”
Clearly, data accuracy is often a source of corporate civil war.
In the data-driven era,
Data accuracy is about to become a further digital, refined hard indicatorsEspecially today, more and more attention is paid to user behavior analysis, excellence, is the trend of history, but also the future of Divine data believe.
Data accuracy is about to become a further digital, refined hard indicatorsEspecially today, more and more attention is paid to user behavior analysis, excellence, is the trend of history, but also the future of Divine data believe.
In fact, excluding human factors or limitations of technical capabilities, the data reported late or lost due to overwhelming factors generally accounts for about 1% in the App side and 5% in the Web side. This might not have been a big deal in the flood of historical data in the past, but in today’s world of refined operations, losing even 1% of the data in a user’s behavior path can affect analytics or miss opportunities.
For example, funnel analysis, retention analysis, attribution analysis and other models are multi-step combination, the loss of data in any link may affect the final results.
For example, the funnel model of an e-commerce company is as follows: browse the product details page – add the shopping cart – submit the order – pay the order. Generally, data is collected and reported in real time, but in extreme cases, some users lose the data when submitting the order, so the conversion rate of relevant links will be inaccurate, and the analysis results will also have errors.
For example, if the unreported 1% of data covers critical and even decisive event data, there will be knock-on effects, such as affecting the integrity of the data.
For example, Shenze data supports the anonymous behavior of users who are not logged in and the behavior after login to get through, restore the complete user chain. In this process, there is a critical event, which binds the behaviors before the user login. If this event is lost, the behaviors before the user login cannot be matched, because the user behavior chain is interlinked.
To sum up, in the era of refined big data, the loss of even less than 1% of data will affect the whole body. As a result,
Shen Ce insists on making the data accurate, and ensure that the data and the real scene happens when and when the match, not a chance and accident.
Shen Ce insists on making the data accurate, and ensure that the data and the real scene happens when and when the match, not a chance and accident.
What you need to know about data accuracy
Looking at most data applications, data processing can be divided into the following five steps, each of which may affect data accuracy:
Figure 1. Five steps of data processing
Generally speaking, excluding human factors, data accuracy can be abstracted into three situations:
1. Statistical differences in caliber
If the App is started, many statistical tools use device IDS to calculate users, which will cause the same login ID to uv=N in the case of multiple devices. The shenzhen-policy data is calculated using shenzhen-policy ID, which can make the same login ID uv=1 in the case of multiple devices.
2. The code collection exception occurs
For example, the client uses the anonymous ID to report data, and the server uses the login ID to upload data. As a result, the same user is not associated, and the system identifies two users, resulting in inconsistent user totals. The shenze data uses the same ID to report full-end data.
3. Data is reported late or lost
Data reporting is usually transmitted through HTTP or HTTPS requests. Therefore, network stability and abnormal App use greatly affect the timeliness of reporting.
In these three aspects, the delay or loss of data reporting is caused by non-technical factors, as shown in the figure below.
Table 1 Data latency scenarios
In the above scenario, the user generates data that is lost or delayed for reasons other than data collection techniques. In this particular case, do you backtrack the data after the fact, or do you ignore the data that wasn’t reported in real time?
Most of the intuitive answers are “use backtracking and keep the data accurate,” but there are two big problems with backtracking:
one, can only allow data to be delayed or lost due to technical capacity limitations;
The second, the data on the same day may change in different periods. How do I explain this to users?
one, can only allow data to be delayed or lost due to technical capacity limitations;
The second, the data on the same day may change in different periods. How do I explain this to users?
Therefore, most of the data analysis platform service providers choose to sacrifice the accuracy of data, different from shence data intensive technology, through the data backtracking and supplement to help enterprises adhere to the red line of data accuracy.
Two, interpretability VS accuracy, the god of data adherence
In the face of changing realities, data accuracy and interpretability are always at odds in extreme cases — where it is easy to accept the loss of data, and where it is hard to understand the complexity of technology.
The technical barrier is never the problem of the magic data, but in the face of the customer’s potential confusion, how to choose?
Shenze insisted on data accuracy.
Shenze insisted on data accuracy.
1. The seemingly correct “wrong”, the data should not change?
In the data analysis industry, in order to ensure the interpretability of the data, the enterprise will basically settle the data by 23:59 and 59 seconds. Due to the limitations of the technical framework of the history of the data analysis system, the data will not be replenished even if there is delayed data. In the long run, enterprises used to settle the data of the day prevail, that is, no matter how the time cycle changes, only look at the data of a certain day is fixed, become the default rule. When the data changes, they are often labeled “inaccurate.”
“Ignoring unreported data in special circumstances” is a relic of history that is mistaken for “the right thing to do”. Therefore, changing data will undoubtedly increase the cost of interpretation.What’s harder is that the data is not for one person, and everyone can challenge the changes.
First of all,, the data analyst will be confused by changes in the data;
The secondEven if it is clearly explained to the data analyser, new interpretability problems may arise when the data is reported upward. For example, data analyst A made A report to the boss, in which it was recorded that the daily activity on Monday was 14,000, but when the boss looked at it on Friday, he found that the daily activity on Monday was 14,500. Therefore, data analyst A may be questioned by the boss, which may bring trouble to A, and even the negative emotion may not be eliminated by explanation.
The secondEven if it is clearly explained to the data analyser, new interpretability problems may arise when the data is reported upward. For example, data analyst A made A report to the boss, in which it was recorded that the daily activity on Monday was 14,000, but when the boss looked at it on Friday, he found that the daily activity on Monday was 14,500. Therefore, data analyst A may be questioned by the boss, which may bring trouble to A, and even the negative emotion may not be eliminated by explanation.
The legacy of history and the cost of interpretation have deterred many data analysis companies, either because of technical limitations or because they cannot face the challenge of “convention”.
but
Choose to stick to your faith and do only the right thing.
but
Choose to stick to your faith and do only the right thing.
Three, dare to be first, god policy data to change system change
”
Put the things
Do it best“Is the principle of divine policy data. In the case of data accuracy, despite the additional technical resources and interpretation costs, insist on”
Bring value to customers“Take responsibility, face doubt, and deal with it.
Put the things
Do it best“Is the principle of divine policy data. In the case of data accuracy, despite the additional technical resources and interpretation costs, insist on”
Bring value to customers“Take responsibility, face doubt, and deal with it.
Choose data to change. At present, there is a 10-day backtracking period after the production of enterprise data using Shence data, during which the query of relevant data may change. After 10 days of backtracking, the data will not change.
The following scenarios are reported data cached by Shenze data SDK:
1. In a force kill scenario, the user shuts down the service
For Android users, the common way to close the App is to go back to the background and cross out the App. In this scenario, buried data will be cached locally and not uploaded in time. For example, the exit event needs to be reported when the user opens it next time.
For Android users, the common way to close the App is to go back to the background and cross out the App. In this scenario, buried data will be cached locally and not uploaded in time. For example, the exit event needs to be reported when the user opens it next time.
2. Multiple processes trigger data collection
For Android apps, multi-process scenarios are common. For example, in the push scenario, or some apps that provide incoming calls, the child process is often active. In such scenarios, the child process has a lot of service data buried, and the data cached locally is reported only when the main process is started.
For Android apps, multi-process scenarios are common. For example, in the push scenario, or some apps that provide incoming calls, the child process is often active. In such scenarios, the child process has a lot of service data buried, and the data cached locally is reported only when the main process is started.
3. There is no network or the network signal is poor
When the network signal is poor in the elevator or subway, the buried point data may fail to be reported, resulting in the local cache of buried point data. Users need to report the data again when the network condition is good.
When the network signal is poor in the elevator or subway, the buried point data may fail to be reported, resulting in the local cache of buried point data. Users need to report the data again when the network condition is good.
4. The App exits abnormally
App exception is the most common scenario. When the App exits abnormally, some buried data may fail to be uploaded in time, and the user can re-upload the data when opening the App next time.
App exception is the most common scenario. When the App exits abnormally, some buried data may fail to be uploaded in time, and the user can re-upload the data when opening the App next time.
5. IOS is started passively
In iOS, when an App is started passively for some reason (such as silent push), all the collected events will be uploaded at the next startup of the App.
In iOS, when an App is started passively for some reason (such as silent push), all the collected events will be uploaded at the next startup of the App.
As the saying goes,
“If you don’t solve the problem, you become the problem.”In the face of the historical challenge of data accuracy, Shence data chose a more difficult road, although this road will encounter customers do not understand, but its end is to bring greater value to customers, Shence data unswervingly. At the same time, if the customer still insists on keeping the data unchanged after understanding the background for some special reasons, we also give a flexible strategy to reduce the backtracking period to 24 hours. In addition, Shenze will also optimize the product, such as adding tips to promote customer understanding.
“If you don’t solve the problem, you become the problem.”In the face of the historical challenge of data accuracy, Shence data chose a more difficult road, although this road will encounter customers do not understand, but its end is to bring greater value to customers, Shence data unswervingly. At the same time, if the customer still insists on keeping the data unchanged after understanding the background for some special reasons, we also give a flexible strategy to reduce the backtracking period to 24 hours. In addition, Shenze will also optimize the product, such as adding tips to promote customer understanding.
In the past 5 years, Shence Data has served more than 1000 enterprises, and will solve the fundamental problems of data for more enterprises in the future.
Bearing the future of customers, Shence Data faces the problems of history, adhere to change to change, to change to change, and strive to lead the coming change of data accuracy in the big data era, more customers escort.
Bearing the future of customers, Shence Data faces the problems of history, adhere to change to change, to change to change, and strive to lead the coming change of data accuracy in the big data era, more customers escort.
Based on the vision of reconstructing the foundation of Chinese Internet data, as for the accuracy of data, we will defend the data to the end and never allow the slightest chance and accident.