Build a reliable front-end exception monitoring service - collection section

In today’s complex World of Web applications, a page can contain not only text, images and hyperlinks, but also complex forms, lots of animations, and massive interactions. Many Web applications are completely single-page, with operation experience and complexity comparable to native applications. This is a huge challenge for developers. Even with unit test, code Review, and various black and white box test escorts, there is no guarantee that the code will not have problems when facing thousands of users and unknown data in various browsers after it goes online. Therefore, for an Internet product with a large number of users, a reliable front-end abnormal data collection, reporting, processing, monitoring and alarm platform is very necessary. Today, I want to talk about how to collect abnormal data.

Note: Both “error” and “exception” in this document are error messages reported during script execution, but they are expressed differently and can be equalized in principle.

Collectable exceptions

There are many kinds of page exceptions, HTML tag exceptions, CSS display exceptions, style, image, script file request exceptions, script execution exceptions. Some involve the user’s own network environment, such as slow network speed, forced by operators to inject tags or scripts, we can hardly avoid; Some problems are only displayed, such as text and button alignment, text folding, users can detect and avoid, do not affect the normal use; However, some exceptions, such as an interaction logic error or a script error caused by the submission of the fill data, will immediately terminate the user’s next operation, which are the most harmful exceptions. The user is not a developer and does not know what the problem is causing the script exception or even that an exception has occurred. And that’s the kind of exception that we’re going to grab. So what specific data needs to be collected?

Global error

Open the browser’s developer tools, and when an error occurs, we can immediately be notified, and know where the error occurred and call stack information. Window.onerror can be used to catch various script execution exceptions on a page, which can help us obtain useful information. There are compatibility issues with this approach, as the data is not completely consistent across browsers, and some outdated browsers provide only partial data. Its standard function signature looks like this:

	
window.onerror=function(message,url,lineNo,columnNo,error)Copy the code

There are 5 parameters in total:

Message {String} Error message. The error description is straightforward, but sometimes you can’t tell, especially if the script is compressed.
Url {String} The script path to which the error occurred.
LineNo {Number} Line Number where the error occurred.
ColumnNo {Number} Number of the column in which the error occurred.
Error {Object} A specific error Object, inherited from a class called window. error, has some attributes that overlap with the previous items, but contains more detailed error call stack information, which is very helpful for locating errors.

If this information can be provided in its entirety, we believe we can quickly locate the error. But different browsers treat the same error message differently. IE10- The browser can only get message, URL and lineNo, columnNo and specific error. The window.event object, however, provides the errorLine and errorCharacter in real time to correspond to the corresponding line and column number information. The call stack is not available, but arguments. Callee. caller can recurse to the call stack in onError. The call stack is also retrieved differently under different versions of IE. You can see examples here. This type of information is the most direct error information and must be captured and reported. The following table compares the default parameters available to different browsers in the same Origin policy:

Browser	Message & Url	Line numbers	Column numbers	Stack trace
Chrome	✓	✓	✓	✓
Firefox	✓	✓	✓	✓
Edge	✓	✓	✓	✓
IE11	✓	✓	✓	✓
IE10	✓	✓	✓	✓
IE 9	✓	✓
IE 8	✓	✓
Safari 6+	✓	✓	✓	✓
iOS Safari 6+	✓	✓	✓	✓
Opera 15+	✓	✓	✓	✓
The Android Browser 4.4	✓	✓	✓	✓
Android Browser 4 — 4.3	✓	✓

Ajax context

Think back to The Times when the program was tested and developed without a problem, only to have a problem with a request for all sorts of data. Sometimes the location of an exception is clearly located, but it cannot be reproduced under regression testing with known data. If you have the data context in which the error occurred, it’s easy to missort. Therefore, Ajax request context can be helpful for troubleshooting. If you’ve used the various fake methods in unit Test assistive tools like Sinon, you’re probably familiar with the methods in xmlHttprequest. prototype. This way, we can hook XMLHttpRequest objects open and send, and get the statusCode, statusText, and even responseText for the returned data (it is not recommended to get this potentially large amount of data).

Operational context

Exceptions, other than those directly requested, can arise from interactions. Imagine a scenario where there is a form, each field of the form is judged and processed, or a control’s onchange triggers a sequence of logic that might cause an exception. It is easier to locate errors if you can provide information about some form controls. Form controls fall into two general categories, click and input, i.e. click and input classes.

Click on the categories:a.button.input[button].input[submit].input[radio].input[checkbox]
Input class:input[text].input[password].textarea.select

Unified records such as tagName and attribute in the tag. Select value and text, textarea, select value, textarea, textarea, textarea, textarea This auxiliary information may help us locate the error.

Rely on page

Today’s systems are almost always built on top of popular libraries. JQuery, Angular, react.js, vue. Js, backbone, underscore, and Knockout

`jQuery`,`jQuery.fn.jquery`,

`jQuery ui`,`jQuery.ui.version`,

`lodash(underscore)`,`_.VERSION`,

`Backbone`,`Backbone.VERSION`,

`knockout`,`ko.version`,

`Angular`,`angular.version.full`,

`React`,`React.version`,

`Vue`,`Vue.version`,Copy the code

Some exceptions usually occur with class library upgrades. If you have a lot of pages that use jQuery, but you can’t regression test all of them after you upgrade, those pages that use a modified or deprecated method may fail. At this point, the information of a page-dependent library may help you completely.

With the exception of the above libraries, most will expose a reference directly to the Window object for the developer to call. You simply loop through the Window object to see which properties contain Version, Version, Version. There will be some libraries that will escape your detection, but it will be a supplement.

Custom data

In addition to the default data collection, providing a custom data interface is a necessary feature. Because business requirements for different products are so different, you never know what data will be useful to a developer. With custom data, developers can use custom data to differentiate between exception types. For internationalized sites, for example, it might be possible to distinguish exceptions in different locales with a single lang field.

Browser data

You’ve probably had the experience of developing code, testing every major browser, and then going live and having a user complain that his browser was having problems. This is a browser-specific exception that is difficult to detect if the test coverage is insufficient before going live. For example, catch and default are reserved keywords in IE8, but not in Chrome. You can compare JScript and ECMAScript 6 keywords in JScript Reserved Words and Reserved Keywords as of ECMAScript 6. If the monitoring system shows that this error occurs only in IE8 pages, you can quickly locate the error. Simply put, the browser data only needs osType, browserType, and browserVersion, which can be obtained by the userAgent.

Other data

In addition to the above important data, the screen resolution and the client time at which the error occurred can sometimes be useful for locating errors. It is necessary to provide as much environmental data as traffic allows.

The ideal is beautiful, the reality is cruel, the difficulties are numerous

We get as much information as possible about the error and the environment, and should be able to locate the error very quickly, but the reality is much more complicated. Browser compatibility, security Settings, static server configuration, and so on are sometimes out of control. There are a lot of differences and uncertainties that make it difficult to get the ideal data.

Same Origin Policy & ‘Script Error.’

The first is cross-domain problems. On today’s sites, static files are mostly placed under a separate domain name. It can not only reduce the domain name limit of concurrent browser, but also improve the access speed of resources through CDN. By default, when a cross-domain Script error message is caught in this domain, only one message Script Error. There is no file information, no column number data, and no detailed error object, which makes other additional information useless. To solve this problem, not only do you need to add the access-Control-allow-Origin configuration on the server, but you also need to add crossorigin= “anonymous” attribute to the script when the client references it. The browser will spit out the detailed error data.

Note: Crossorigin has compatibility problems, and there are also some points to pay attention to in use. If only Crossorigin is set and access-Control-Allow-Origin is not set on the server, some browsers will not even load the script file. Before IE Edge, crossorigin was not supported, and even if you set the above two items, you still only get Script errors. The following figure uses Chrome as an example to compare error returns in different situations.

Set the crossorigin	The CORS function is enabled on the server	Whether to load the script file	Abnormal content
✗	✗	✓	‘Script error
✓	✗	Those who qualify can go onto university.	–
✓	✓	✓	Contain all the information
✗	✓	✓	‘Script error

(uglifyjs + combo) vs sourcemap

At present, most static script files of the site are compressed and confused when they go online. So when an error occurs, the row number you get is row 1, and the column number is going to be a huge number. You can only rely on the error message and the file path to locate the error. Fortunately, we have Sourcemap, which allows us to locate source code. You can learn more about sourcemap. Mozilla has an open-source tool for Sourcemap, which can be used to generate the sourcemap or calculate variable names and column numbers from the sourcemap.

In this way, the generated Sourcemap is saved offline, so that when errors occur, the sourcemap can be used to protect the code and quickly locate the problem

There are also many sites that use combo to request multiple script files at once, and the error message becomes more complex. If the Combo server merges all the files into one line, the line number information becomes useless, and you only have message to refer to. You can set the combo policy, such as creating n lines for each file, with several blank lines between them. In this way, when we package, we add banner information to each individual file. Even after combo, we know which file is in error. Below is the merged file returned by the JD Combo server:

The throttle

Based on the type of data we need to submit above, the amount of data to report when an error occurs is quite large. If an exception is repeatedly triggered, it bombards the server continuously, resulting in both data redundancy and traffic waste. Therefore, the reporting of abnormal information should be limited from the content and frequency of reporting.

You can flexibly report required data types through pluggable configurations. You can filter errors that do not need to be reported or the error page is not reported according to the error information keyword.

In terms of the limitation of reporting frequency, the following schemes are generally feasible.

Random report. Not all errors are reported. Given a condition, those that meet the condition are reported.
Merge report. When an exception occurs, a queue and delay are given. If a new error occurs during the delay period, it is added to the error queue. When the queue reaches its maximum capacity, all abnormal information in the queue is reported in a centralized manner. Another benefit of merge reporting is that some common information, such as dependency information, browser-specific, and user-defined data, can be merged.
Server restrictions. Things on the client side are always out of control. The server should also do some monitoring and return 429 when the number of errors reported by the client per unit of time exceeds the limit. In this case, other users on the same network segment may be injured.
Data compression. If the amount of data to be collected is not large, the plaintext data can be reported. But for the merger report this situation, a data amount may be more than ten K, for the daily PV site, the flow generated is still very considerable. Therefore, it is necessary to compress and report the data. Lz-string is a very good string compression library, good compatibility, less code, high compression ratio, compression time is short. The following figure compares the length of a single error message before and after compression. You can see that the length is 1860 before compression and 501 after compression, which takes 6 milliseconds and is about 70% compressed.

Data differentiation

Differentiation is mainly manifested in inconsistent error information. ColumnNo: message, columnNo, error (some do not have this object) This requires a normalize function to smooth out the differences as much as possible. This article has some implementations for consistency.

Other resources

Img, link, and script resources can be dynamically loaded by adding the onError callback function to the tag to check whether these resources are loaded successfully. But there is no global control, because libraries may have their own implementation of loading external resources, and we cannot add onError callbacks to every dynamic resource without intruding.

Carrier injection

Most sites in The country are still HTTP and can be easily injected into scripts or pages by operators. Such scripts can cause errors, but they are not maintainable and can do very little on the browser side. However, a blacklist or whitelist can be created on the reported server to block abnormal script domain names. Of course, the best solution is to support HTTPS directly, which greatly eliminates this kind of error.

Cross-domain report

Once you get the abnormal information, the next step is how to report it. Due to the large number of data types obtained, the amount of data is not small, and the simple GET method cannot meet the demand. Data can be submitted using the Ajax post method, which requires setting access-Control-Allow-Origin on the reporting server to Allow cross-domain submission. For low-end browsers (IE6-7), ifamre+ POST is a perfect solution because cross-domain Ajax submission is not supported.

Due to the restriction of the same Origin policy, data sent from different protocols is considered insecure. Therefore, the report server must provide HTTP/HTTPS bidirectional entries to adapt to different protocols.

Design principles

Providing error information and context as much as possible is indeed very helpful for locating errors. However, due to the actual situation, we cannot ensure that the remote static server can be controlled and the user browser can be controlled. Therefore, there are several aspects that need to be focused on when designing the collection module.

The configuration is pluggable

Pluggability is reflected in

The report content is configurable. You can configure data at the page level and send data on demand. There are many types of data I’ve covered: exception data, Ajax context, interaction context, dependency library information, custom data, browser information, and not all pages need to report this information. A portal page with high PV is only interested in abnormal data and does not want to cause extra traffic. Other data does not need to be sent; An erp in an enterprise needs to ensure the integrity of data. Every error should be taken care of, and all abnormal data should be reported.
The reporting frequency is configurable. Whether it is one-by-one continuous reporting, combination of random reporting or delayed reporting, completely customized according to the needs of the page.

Compatible with multiple platforms and browsers

Many exceptions are not easy to be found in testing because of their specificity and platform uniqueness. Therefore, when collecting exceptions, you must be compatible with browsers of various platforms as much as possible to ensure that error information can be reported on different browsers. At present, the browser has moved from the desktop to the mobile end of the battlefield, the mobile end of the traffic is growing by leaps and bounds, and the mobile end of the browser is re-creating the desktop compatibility of a few years ago chaos. Therefore, the acquisition system also needs to consider the performance, compatibility and data types of mobile terminals.

Supports custom data

Even though there are many types of data reporting built in, there is no way to cover them all. It is necessary to provide a custom data interface. Users can report customized data based on different pages for exception analysis. For example, for a running system, due to historical reasons, two online versions of the script are deployed. I can add the corresponding version number data to different systems and compare the proportion of errors occurring in different versions of systems through monitoring.

The last

“Talk is cheap. Show you the code” Flextracker is a simple implementation of what I said above, and you can repackage it to suit your needs.

reference

how-to-catch-javascript-errors-with-window-onerror-even-on-chrome-and-firefox
Cross-domain Script Errors
JS stacktraces. The good, the bad, and the ugly
‘Script Error’ and get the most data possible from cross-domain JS errors
blink start window.onerror extra new params support after stable 28
lz-string: JavaScript compression, fast!
How to monitor front-end exceptions?
Collect and monitor front-end code exception logs

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Build a reliable front-end exception monitoring service – collection section

Collectable exceptions

Global error

Ajax context

Operational context

Rely on page

Custom data

Browser data

Other data

The ideal is beautiful, the reality is cruel, the difficulties are numerous

Same Origin Policy & ‘Script Error.’

(uglifyjs + combo) vs sourcemap

The throttle

Data differentiation

Other resources

Carrier injection

Cross-domain report

Design principles

The configuration is pluggable

Compatible with multiple platforms and browsers

Supports custom data

The last

reference

Build a reliable front-end exception monitoring service – collection section

Collectable exceptions

Global error

Ajax context

Operational context

Rely on page

Custom data

Browser data

Other data

The ideal is beautiful, the reality is cruel, the difficulties are numerous

Same Origin Policy & ‘Script Error.’

(uglifyjs + combo) vs sourcemap

The throttle

Data differentiation

Other resources

Carrier injection

Cross-domain report

Design principles

The configuration is pluggable

Compatible with multiple platforms and browsers

Supports custom data

The last

reference

Related Posts

Egg.js+Antd a simple version of ali cloud CS console?

JS to achieve infinite rotation

Talk about CSS3 new feature: Border Border