background
Error or business logging is a common habit in back-end services. It helps developers quickly locate system status, track bugs, and learn about anomalies. On the other hand, in the front-end development world, it is very rare to see a developed logging or monitoring system.
So how to build a front-end anomaly monitoring system suitable for their business needs in their own company? This article is a summary of the actual project landing process.
The problem
Before the people network Team liu Xiaojie students out of a “browser side JavaScript abnormal monitoring” front-end share and article. The realization scheme of the whole monitoring system and some problems that may need to be solved in the future are described. If you haven’t read this post, I strongly recommend you to check it out. In fact, many companies have made attempts in this aspect, including some open source solutions, such as BadJS of Tencent, JSTracker of Taobao, FdSafe of Alibaba, SaiJS of Alipay, Sentry of foreign countries and the corresponding front-end SDK RavenJS, including the corresponding TraceKit. There are a lot of solutions out there that have been proposed, implemented, and applied 3 or 4 years ago when you actually start doing it.
But are they really the solution for everyone? The answer is definitely not. Mainstream front-end monitoring systems, including some paid services, such as Fundebug in China, have similar product functions to Sentry, except that it only focuses on the front end, sentry focuses on all Error collection scenarios. The problems they solve are generic, but it is up to the business side to decide whether they are appropriate for a specific project.
In fact, it is not difficult to see that the development of front-end anomaly monitoring system is not complicated, there are many open source implementation schemes, the technical implementation cost is not difficult, there are pain points but not all can not be solved, to sum up:
- Front-end SDK needs to be implemented, mainly error interception, proxy monitoring, reporting strategy, API design, and logging interface.
- Reported logs can be queried in real time.
- Monitor the development of visual log management background.
- Compressed single – line file how to locate source errors.
The above five points are basic, and we can even expand the mature product functions such as SMS reminder and email reminder. Of course, there will be a lot of dark holes, dark holes, and business scenarios that you want to customize yourself, so in this article, we will give you a brief overview of our project experience.
The realization of the SDK
As an aside, what do you do when you write a front-end SDK, if it’s not a report plus intercepting bug SDK?
One of the best ways to start is to write a tutorial on how to use the tutorial or how to get started. This is something I always try to do well, because only when the tutorial and how to get started are you going to code the corresponding API.
The second is that if you are working with multiple people, you must first establish the team’s code specifications, writing style, project scaffolding, test cases, and front end projects that must have cross-browser compatible automated testing.
Here is a brief introduction to the preparatory work before starting.
"scripts": { "pretest": "jshint -c .jshintrc src", "test": "mocha --compilers js:babel-core/register -r jsdom-global/register", "easysauce": "easy-sauce", "devbuild": "rollup -c rollup.config.js --environment entry:src/index.js,dest:dist/ger.js", "testbuild": "rollup -c rollup.config.js --environment entry:test/index.js,dest:test/index.build.js", "build": "rollup -c rollup.config.js --environment entry:src/index.js,dest:dist/ger.min.js,uglify", "beautify": "node ./scripts/beatufyjs.js src test", "watch": "rollup -c rollup.config.js -w", "start": "Http-server -a 0.0.0.0 -p 8080-s ", "dev": "npm-run-all --parallel start watch"}Copy the code
Pretest ensures that code is formatted error-free, and jshint is used for validation.
Test is a local mocha test, because the project is written in ES6, so you need to compile the test file with Babel, because it is browser-oriented, so you need to add jdom-global plug-in to enable the command line to support dom, window, document and other browser global properties.
Easysauce was used to connect to Saucelabs for compatibility tests.
Devbuild, testBuild, and build correspond to rollup to generate the development version, production version, and the test file provided to Saucelabs for testing.
Beautify is a unified format. It works with pre-Git to make it more convenient.
Watch, start, and dev are all commands used to build the environment during development. In fact, NPM -run-all can start two commands in one click.
For specific configuration, please refer to the package.json file in the open source project. Here is a brief introduction to the preparation, and the design principle of the directory structure is also relatively simple. You can understand the meaning according to the name of the folder. Finally, the GER constructor is external to the global properties through hybrid inheritance, because rollup is configured in the UMD format, so it is compatible with various references.
Here’s the core:
Error to intercept
The common solution is to intercept ‘window.onerror’. Here, we need to pay attention to several points. One is to save the onerror of the current page, and then call back after the interception, so as to avoid the failure of the onerror of the page. You can control this with the return value of the custom event execution. In addition, when the error object does not have a stack, the error call stack is recursively recorded by itself, mainly using ‘arguments.callee.caller’ to do stack recursion. Finally, the unified encapsulation of error message format is done.
Finally, the error message is put into the queue for reporting.
The solution is to add the Crossorigin attribute. Most companies also support the configuration of Access-Control-Allow-Origin. Therefore, the problem is not serious. By default, errors that cannot obtain detailed error information across domains are ignored.
Agent monitoring
Onerror reporting methods many many tutorials have been written, but in fact, there is a proxy monitoring method can replace error interception, that is, some known system methods for proxy, or some framework methods for proxy, here we refer to badJS several implementation schemes, a brief talk.
First, all methods under the ‘console’ object can be graded. Then, logs can be reported directly using methods such as log. Finally, error logs can be processed into MSG.
Then we can proxy ‘setTimeout’ and ‘setInterval’ to detect errors in timers.
If your site uses jquery or Zepto, Requirejs or Seajs, blocking the corresponding event listener method, or define method, will also do most of the business error blocking.
Is that enough? Of course not. By intercepting functions, we automatically detect parameters in the proxy function, such as callbakc or object or array, and report the corresponding function with a try catch.
How do these methods work? The code is also very simple, we first save the existing function, and then detect the parameters of the function, and recursively call the interception method. Finally, when executing the function, try catch the execution process, and then automatically report. Of course, there is another point to pay attention to, is to do a delay processing in the report. To avoid triggering onError on the page during the reporting process and causing repeated reporting, empty onERROR first, then throw, and then reset back to onError.
Report the strategy
After the idea of wrong interception and monitoring, some people may think that the report strategy is very simple. We refer to badJS and RavenJS report strategy, and summarize some common configurations and additional configurations that we think of.
For example, the switch of proxy function, some users may only want to intercept onError by default, because after all, try catch wastes performance. After that, when multiple errors are triggered at the same time, errors are combined and reported or delayed (to avoid thread preempting user operations). Url configuration of error log interface. Filter some types of errors that cannot be sorted out, such as cross-domain error type ‘Script error’, the probability of reporting sampling (large websites must have), and the number of times after repeating the error will be ignored to report (spam).
We have extended several additional cases, mainly in our work, we will encounter some bugs in the user’s environment, but we can not meet the situation. To put it bluntly, it is impossible to reproduce, and then when you contact the user, the user may be inexplicable.
To put it bluntly, the essence of the problem is how to preserve the exception scene, so we have added a few additional configurations to set up local storage, save your last N error records on the client side, to avoid reporting lost or due to probability issues can not report lost errors.
In this way, you can obtain the previous error logs from the local information of the client in case of user complaints, avoiding the problem that you query the logs again or the logs become invalid because the time has expired.
We have all seen some large manufacturers, some complaints about the client side of the abnormal collection page, will send customers a website URL, which will show all the client information and cookies, LS, etc., then this local error storage function can help you locate the error.
Design the log interface
The basic format of the upload log interface has been determined.
It basically covers the user’s client information, error information, page information, and if you are actively reporting, it also provides additional information about the current context that you want to add to help you debug errors.
You may not believe it, I once had a big promotion activity, the whole company’s test machine is ok, but there are a lot of complaints about the user client error situation, I actually log real user data in the real-time log system to locate the problem.
Things like token values, userids, and whether some environment variables are properly initialized can be added to Ext to help you locate bugs that cannot be reproduced.
API design
This part has less to do with error reporting and more to do with what API is appropriate for a reporting SDK usage scenario.
First of all, I understand that the API should contain several aspects, one is to get the current property state, one is to call the method, and is convenient to decouple the interface extension.
The properties must contain the current SDK configuration state, broadcast event queue, error message stack, and blocked error queue.
After someone else has initialized the SDK, other people or departments may need to adjust your default parameters. ‘set’ and ‘get’ methods are definitely required, which allow you to change configuration items after initialization. Then there are the reporting methods, which refer to all the console methods, such as error, debug, info, etc.
Send (catchError, catchError); send (catchError, catchError);
Finally, there are some general ways to handle local storage and events. Such as’ on,off,trigger,clear ‘and so on.
For the event, I have designed 4 broadcasts here, ‘beforeReport’, ‘afterReport’, ‘tryError’, ‘error’, as you can see by the name, you can process the error before and after the report and the interception proxy or other extensions you want to do.
Reported logs can be queried in real time
SDK design and implementation of the idea, and we can take the next step, we all know that a PV high site, a place of front page error, then report log volume is very large, so did sampling and filtering in the SDK level, but even so, if the quality of your web site is not high, the amount of log report also will be very big, The traditional way of writing to a database or writing to a cache will not work. Report service must be log service.
The log service has a great defect is that it is difficult to do real-time processing, because the log storage and volume are very large, if I want to do some log sorting, summary, aggregation, query and other operations, it will be very time-consuming.
The architecture we have chosen is that currently, the SDK uploads error messages to the front-end machine (the front-end machine refers to the service layer, which has load balancing policies), and then the front-end machine, such as PHP or Java, writes IO to the native log, Then, a total log server collects logs collected by different front-end servers and writes them to ElasticSearch. Finally, ElasticSearch API is used to perform aggregation and query operations, and logs can be viewed and queried in real time, as well as statistics and summary analysis.
ElasticSearch we choose is nodejs API, installation of NPM package can use directly, configured address and port can use ElasticSearch, mainly USES several methods is ` search `, ` msearch ` query and multiple query results. Aggregations are an important part of elasticSearch’s DSL syntax. They provide the ability to aggregate logs. For example, if I want to create a total number of error triggers for 7 days, I need to use them. Or I want to summarize the number of error files or even conditional grouping, which is also needed, which is why we do not choose the back end of the log query interface system Kibana and choose to do their own problem.
Because the system not only focuses on error logging, but also on resolving errors, sorting out and observing errors.
Monitor the development of visual log management background
In addition to doing business every day, friends of big factories may develop some management platforms for developers, such as machine configuration of operation and maintenance, online version management, automatic tool management interface, and machine performance monitoring and alarm.
So of course we can also develop our own error log visualization platform for the front end!
This article is not a tutorial on how to use Express and VUE to develop web applications. There is not enough space for this article, so I can only say that we use the two frameworks to develop the system, and use the ELASticSearch API as the interface.
Here is a focus on the development of a tool system general routine:
- User management.
- Permission management.
- Error aggregation and presentation
- Error Details Display
Function is very simple, actually any back-end language can quickly implement, in order to avoid to apply for a lot of company resources, you can choose the ldap way of landing can also choose to plain text for user information, anyway, internal use didn’t also how much quantity, so you don’t need to operate the database, because the data interface are from logging service.
Then there is the right and wrong, let’s use the dimensions of the product to think about how developers are comfortable with it.
First of all, service functions are distinguished according to service functions. The most important way to distinguish service functions is by domain name, which is decomposed by secondary domain name. Errors are summarized and errors are summarized, such as error status and graphs under a domain name.
When viewing details, filter the list based on your query criteria, such as time, error MSG, error page, file name, etc.
Finally enter each error information display page, the error information, client information and so on all can be listed.
A simple and practical replacement for a small Sentry system was developed.
How to locate source errors in compressed single-line files
This problem may be too many people focus on one point, the people net team gives solution is to generate different files and then cooperate with sourceMap in error report or in the local construction, to report the try catch code function bring the error source line number, either way, actually is not suitable for our current scenario, Is too intrusive, many departments, no one to use, seriously affect performance and efficiency, increase the quality of code.
The use of function proxy may alleviate the problem, but once the business team does not enable the function of function proxy, the positioning error is 1 row, 3000 column error, and the error message is compressed after various, such as’ AF is not defined ‘, it is very embarrassing.
In fact, the solution is to change the idea of a good, we hand conditions, there are errors in the compressed JS file, there are error lines and columns, there are error messages.
So can we add such a function in the background of the visualization platform:
Check whether the error is a compressed error, and then check whether the error can be quickly located based on the error information.
So you as the developer, you must have the source code of this compressed JS, and then this JS must also be compressed by you, and you must also have its sourceMap file.
By uploading your sourceMap in the background, or even uploading your source code, the platform itself can generate the corresponding sourceMap for you, and then convert the single row and column into the row and column of source code. All this can be done automatically, you just need to drag the source code into the web interface.
The key code is here:
var fs = require('fs');
var sourcemap = require('source-map');
var smc = new sourcemap.SourceMapConsumer(fs.readFileSync('./test.js.map','utf8'));
var ret = smc.originalPositionFor({line:1,column:105});
console.log(ret);
Copy the code
Simply pass in the line and column from the log using the originalPositionFor method, and the source code error is presumed. The rest of the steps need not be explained.
conclusion
This is basically the end of this paper. We will open source the SDK code, visual platform implementation, front-end VUE implementation and query code example including ElasticSearch after internal improvement. Although it is open source now, However, documents, installation examples and report presentation effects have not been verified, and it is expected to pass the overall test and go online before the end of April.
If you can’t wait to use, or want to customize your own front-end JS error monitoring system based on this article, welcome to join my chat.
Activity Details (QR code automatic recognition)
You can join gitChat for an hour of professional communication and sharing by scanning your wechat id.
Did this article help you? Welcome to join the front End learning Group wechat group: