Translated from an article by Google engineer Philip Walton. It has 3,754 words and takes seven minutes to read. Qualified engineers recognize that data is as important as functionality, because accurate data collection is the basis for decisions about product iteration and marketing. This article will help you explain why your usual statistics are wrong. Then suggest possible solutions.
To put it bluntly, the current Page View (commonly referred to as PV, or the number of times a user views a Page) tools on the market do not accurately measure the number of new sites and are completely out of touch with the evolution of WEB technology.
For the most part, these tools assume a Page Load for each Page View and run some statistical code to send the Page View event to the back-end server after each Page Load. Any site that doesn’t fit this pattern requires extra work by engineers to get the results right, yet most front-end engineers don’t seem to have the expertise or simply don’t have the time.
The reality is that WEB technology has changed dramatically over the past 10 years, with more and more websites no longer conforming to the traditional WEB model, and the evolution of our analytics tools has not kept up.
What’s the problem?
As a concrete example, consider Mail.google.com (Gmail). Most people who use Gmail leave it running in the background when they first open it, check for new messages at regular intervals, and open it without refreshing the page.
Since the vast majority of Gmail users hardly ever refresh their pages, this raises some very interesting but important questions from a statistical perspective:
- If a user opens Gmail once and uses it hundreds of times over the next few days, but doesn’t refresh the Page, should that count as just a Page View?
- If the user clicks on the Logo in the upper left corner of Gmail to refresh the Page content, or pulls down the refresh on mobile, should it count as a new Page View? How is this action different from a complete page refresh?
- If a user opens and reads a new email without actually refreshing the Page, should it count as a new Page View?
- If two users visit Gmail exactly the same number of times per day, but one refreshes every time while the other keeps the Page running in the background, should the Page View statistics for the two usage patterns be significantly different?
The point of listing these questions is that for some sites, continuing to use traditional Page View statistics can lead to inaccurate statistics, which can become wildly inaccurate as the technical implementation of WEB applications evolves over time.
Imagine that you add statistical code to a traditional WEB site, and a few months later you update the site to a Single Page Application (SPA, or Single Page Application) without changing the statistical code. A few months later, you update your site to a Progressive WEB Application (PWA, or Progressive WEB Application) that reloads content in the background and works offline, and you still haven’t updated your statistical code. If the number of visitors to your app and usage patterns hadn’t changed over time, you’d expect the statistics to not fluctuate wildly.
Unfortunately, in this case, even if you improve the user experience, the Page View statistics will definitely go down. This is a very bad situation: you want to improve the interactive experience of your site, but you can’t convince anyone with data that it’s worth it, because statistics tell you the opposite.
How to solve it?
There is always a solution to any technical problem, and the solution proposed in this article is to go back to Page View as a metric. We need to track not how many times a page is loaded, but how many times it is viewed.
This can be done using the Page Visibility API, which has been around for a long time and is supported by almost all major desktop and mobile browsers.
It turns out that counting the number of times a page is viewed rather than loaded is an elegant way to solve many problems that traditional statistics can’t:
- When a user opens a TAB that goes into the background and then cuts back a few hours or days later without reloading the page;
- When a user opens a page as a reference and toggles back and forth to quickly browse through the content, but does not reload the page;
- When the user opens the page in the background TAB and then forgets, never actually viewing the page content;
The Page Visibility API consists of the Document. visibilityState property and the VisiBilityChange event. With these two apis, you can ensure that Page View statistics are only sent when the Page’s visibilityState is visible. In addition, you can listen for visiBilityChange events and send new Page View statistics when a user switches back to an application that has been running in the background for a while. The Page Visibility API is a great solution to the problem of Page View statistics for WEB applications that rarely need to be refreshed after loading.
The second part of the solution is the History API, which is the foundation of SPA application building technology supported by all major browsers today (see more), so that statistics tools can listen for URL changes to send page statistics similar to traditional websites.
How do you do that?
The basic idea for using the Page Visibility and History API to accurately measure Page views is as follows (this idea applies to traditional websites, spas, and PWA) :
- When the Page loads, if the Page’s visibilityState is visible, send Page View statistics;
- If the Page’s visibilityState is hidden, listen for the VisiBilityChange event and send Page View statistics when the visibilityState becomes visible.
- If the visibilityState changes from hidden to visible, and “sufficient time” has elapsed since the last user interaction, a new Page View statistic is sent;
- If the URL changes (only the Pathname or search part sends the change, the hash part should be ignored, because it is used to mark in-page jumps) send the new Page View statistics;
Step 3 above is the most important, but also the most ambiguous and controversial, and the key question is: How long is “long enough”? On the one hand, you probably don’t want to send new Page View statistics every time visibilityState changes, because it’s very common for users to switch back and forth between tabs, and the reality is that some apps are most convenient to open and use in multiple tabs at the same time. And that comes with a lot of TAB switching. On the other hand, you want to measure returning behavior after not interacting with your application for a period of time, which means you need to measure repeated use.
Fortunately, all statistical tools define a way to distinguish multiple uses, called sessions, or sessions. A session is a collection of user interactions that occur in a given window of time and end when a preset period of time passes. For example, by default, a Google Analytics session ends after 30 minutes of no interaction. Most statistics tools provide the ability to customize session duration.
So going back to step 3 in the list above, my suggestion is to send the new Page View statistics if the user session has ended and the Page’s visibilityState has changed from hidden to visible. Changes to visibilityState that occur within a session should not be treated as different Page Views.
Note: If you use Autotrack (specifically pageVisibilityTracker and urlChangeTracker plug-ins), you don’t need to implement the above logic yourself. These plug-ins handle all of these situations automatically, although you can customize the behavior of the plug-in using configuration items.
How to reduce false positives?
In creating the pageVisibilityTracker plug-in for AutoTrack, I did a lot of thorough testing of various implementations based on the Page Visibility API and found it necessary to use heuristic information to avoid misreporting.
For example, when a user uses the keyboard to quickly switch back and forth between a bunch of open tabs, the visibilityState on many pages goes from hidden to visible, only to revert back to normal shortly after. In my tests, a significant percentage of Page Views were caused by the visibilityState becoming visible after the session timed out, but then the visibilityState reverting back to hidden. 99% of these pages revert from visible to hidden within 5 seconds.
When I analyzed my usage patterns, this was not surprising. It was common to accidentally switch to a TAB and then leave; Switch to one TAB because I want to switch to another TAB, and that TAB is stuck in the middle (keyboard switch here); Switch to a TAB, just to close it. In all of these cases, sending a new Page View doesn’t make any sense, and setting a timeout of 5 seconds before reporting Page View statistics prevents more than 99% of false positives.
Page View and Page Load
Sometimes you might want to know how often your site loads but is never viewed. You might also want to know if Page views are triggered by initial Page loads or by visibilityState or URL changes.
Obviously you can create a custom dimension to count Page loads (in fact I usually do), but it becomes clear from this question that what we really need are two separate metrics: Page View and Page Load. Fortunately, most statistics tools today allow users to customize metrics for whatever they want, and AutoTrack supports configuration items to help you separate page views from page loads.
By decoupling Page View from Page load, we can fully grasp what Page View really means: it measures how many times the user actually views the Page, regardless of how many times the Page is loaded.
Page View and the Session
Some readers may wonder: as long as you correctly count all user interactions since the first page load, what’s the point of counting only the first page load? Why is it important to get the Page View count right?
While this may seem like a reasonable question, if you understand the data models used by most statistical tools, you will quickly realize that the questions themselves are tenuous.
Most analysis tools assume that each session contains at least one Page View, which is used to determine metrics such as the Landing Page and Exits. If you count only the initial page load, and then all subsequent sessions contain only event statistics, most session reports are a mess. Almost all traditional WEB statistical tools use this model to calculate, which also confirms the limitations of traditional models.
Tool limitations aside, another compelling argument is that all sessions that involve user interaction events should include at least one Page View. After all, how can you interact with a Page without opening it? Sending a new Page View when the session times out and visibilityState becomes visible is a good way to solve this problem.
Summarize and TIPS
Hopefully, after reading this article, you will be able to rethink the correct posture of the Page View. If you are using statistical tools in your own projects, you will be able to use this advice to make your statistics accurate.
Statistical tools should measure user engagement, not be coupled to a site’s technical implementation. When the user experience improves, we should be able to prove it through analysis reports from statistical tools. This is the most straightforward way to use technology to drive your business.
If you are using Google Analytics, you can apply this solution to your project by using Autotrack (highly recommended for SPA or PWA projects). See an example of how to configure Autotrack? Please move to the AnalyticsJS-Boilerplate warehouse.
One More Thing
This article is translated by Wang Shijun. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source. If you found this article helpful, please give it a thumbs up! If you have any questions about the content of this article, please leave a comment. Want to know what I’ll write next? Welcome to subscribe zhihu column: “Front End Weekly: Keep you up to date in the front End field”.