background

I wrote a year-end summary article before. Some friends are interested in the monitoring we are doing, so I hereby write an article to sort out our overall set of ideas for your reference.

In fact, the development of front-end anomaly monitoring system is not complicated, there are many open source implementation schemes, the technical implementation cost is not difficult, there are pain points but not all can not be solved, according to our situation summarized:

  • Front-end SDK, mainly user behavior tracking, error interception, reporting strategy, API design.
  • Reported logs can be queried in real time.
  • Hierarchical warning.
  • Log analysis policies.

The front-end SDK

User behavior tracking

Capture the user’s operation path, according to the operation path we restore the user’s use scenario, to help us quickly locate the problem.

The operation path is as follows:

  • The event is triggered. According to the business scenario, only the user’s click/change and pull scroll bar are captured.
  • Browse the path. This is divided into 2 cases, SPA and multi-page application, multi-page application we can passreferrerTo confirm the PREVIOUS page URL. Spa page we are on the routing function to listen to do.

Of course, we will access the whole data based on cookies and localstorage.

abnormal

The script is retrieved from window. onError and intercepts Angular and Vue handleErrors. In addition to ajax error messages, ajax intercepts returned errors based on business requirements.

In fact, this pit is quite many, but on the market you greatly summed up the good enough, we can see you greatly summed up.

The operating system

This is the whole system information, as well as browser information, UA, etc.

conclusion

There are actually two difficulties in SDK. One is how to define user behavior. Another exception collection this one will have quite a few pits to tread. The other part is the overall reporting strategy. At present, we classify anomalies. Low-level errors are delayed and reported together, and the same point and the same kind of errors are reported again.

Log collection

All logs are called to nginx, and nginx backs up the logs and requests the proxy to the node service. The Node service cleans the data and stores it. So here we have a regular task to flush out unprocessed requests from the nginx backup logs due to a service failure.

Why not use the popular elK? A: According to the research, our level of elK is not yet fully increased. We prefer a controllable state.

warning

The early warning service adopts a hierarchical policy. According to the organizational structure, if high-level exceptions are not handled for a period of time, the early warning system will trigger the upward summary policy until the department head.

Show analysis

At present, this section is relatively weak, and basically only analyzes the project situation of one cycle. The whole focus is still on the problem solving level.

conclusion

The front-end SDK is more focused on front-end exception collection. In fact, the overall back-end service is for all the exceptions to do, we prefer to provide a set of perfect log system for the company (PS: at present, the back-end monitoring data of our team is also gradually uploaded to this system).

Finally, we hope that interested students can join our team email:[email protected] (in addition to the front-end, we also recruit Python, crawler, big data), and we also hope that you can give us some comments and suggestions, after all, the students in the group also spent their spare time to improve the overall scheme and complete the development. There’s still a lot of room for improvement.