Why do front-end monitoring

Why we do the front-end system, can be seen clearly from the table below, the front end performance enhance or the value of the product, quite helpful, but if we can real-time collected these information, and to monitor and alarm, to keep the product in the product line has been efficient operation, this is our goal, Doing front-end monitoring is just a means to that end.

Secondly, front-end monitoring allows us to find problems (slow page loading, etc.) or errors (JS errors, resource loading failure, etc.), and we can never wait for user feedback and complaints. After we improve the performance of front-end code or related measures, we can have a clear data before and after the improvement of performance, which is easier to write reports (KPI).

So rolled up the sleeves, dry dry, their own reference to the market of the clock front-end monitoring system, make a fit the needs of the company’s front-end monitoring system. And plugged it into an internal system for testing. I have participated in the process of product design, front-end and back-end development and SDK development, and learned a lot. Now I will share with you.

Technology selection

  • Front end:React.echarts.axios.webpack.antd.typescriptAnd so on;
  • The backend:egg.typescriptAnd so on;
  • Database:mysql.opentsdb;
  • Message queue:kafka;

Originally, vUE was all used in the company. Why DID I use React here? Firstly, I have always been interested in React, and secondly, I used VUE too much. React uses JSX and render functions to achieve high degree of freedom encapsulation, while Vue needs to spend more energy on encapsulation. React, on the other hand, takes a lot of effort to manage state, with an unwary render function firing in an infinite loop. Vue is simpler.

System introduction

What were they monitoring

By burying the SDK and reporting data, the following two types of data are monitored:

1. Load performance data

Performance data is reported using the OpentSDB timing database (timing database is very suitable for monitoring class data). First, take a look at the specific data reported, which is an array, as shown in the figure below:

Let’s see what each field means:

Frontmonitor.perf. time_dns: Front-end monitoring system – performance – time – DNS

We can extract performance numeric type metrics from metrics:

It also records some string type metrics: operating system type, browser type, resolution, page path, domain name, SDK version, etc., which can be found in tags.

According to the above indicators, the following pages can be made:

Performance Overview:

Page performance:

2. Load resource data

Opentsdb is also used. To save space, HERE I show only one of the arrays, as shown below:

The metric here I fill in one of the article is frontMonitor perf. Resource_size refers to: the front-end monitoring system performance – resource – size.

Resource loading of data. We can use the performance getEntriesByType (‘ resource ‘) :

Similarly, we can extract metrics of the performance numeric type from metrics:

It also records some string type indicators: resource name, resource type, domain name, protocol, etc., which can be found in tags.

The resource loading page can be made according to the above indicators:

3. Incorrect data

Front-end errors fall into three categories:

3.1 Script Errors

import BaseError from './base' import EventUtil from '.. /.. /utils/event' export default class ScriptError extends BaseError { constructor () { super('script') } start () { This.attachevent ()} attachEvent() {eventUtil.add (window, 'error', (e) => {this.handleError(e)}, Eventutil. add(window, 'unhandledrejection', (e) => {this.handleError(e)}, false) } handleError (e) { const { message, filename, lineno, colno, reason, type, error } = e if (! message) { this.send({ type, message: reason.message, stack: reason.stack }) } else { const lowMsg = message.toLowerCase() if (lowMsg.includes('script error')) { this.send({ message  }) } else { this.send({ message, filename, lineno, colno, type, stack: error.stack }) } } } }Copy the code

If the referenced script is cross-domain, you need to set it separately:

  • <script type="rexr/javascript" src="https://crossorigin.com/app.js" crossorigin="anonymous"></script>To quotescriptLabel withcrossorigin="anonymous"
  • The headers returned by the server include:Access-Control-Allow-Origin: *

3.2 Resource loading errors

You can catch resource access failures such as IMG, script, style, and so on.

import BaseError from './base' import EventUtil from '.. /.. /utils/event' import DOMReady from '.. /.. IE8 Export Default Class DocumentError extends BaseError {constructor () {super('document')} start  () { this.attachEvent() } attachEvent () { DOMReady(() => { EventUtil.add(document, 'error', (e) => { const el = EventUtil.getTarget(e) const tag = el.tagName.toLowerCase() const src = el.src this.send({ el, tag, src }) }, true) }) } }Copy the code

For this type of error to be caught, the following two conditions must be met:

  • Events need to be set in the capture phase
  • Resources must be in the DOM tree

3.3 Ajax Request Error

Here you need to patch the native XHR to block Ajax requests

import BaseError from "./base"; Const urlWhiteList = ['//api.b1anker.com/msg', '//api.b1anker.com/d.gif/', '//api.b1anker.com/form/push' ] export default class AjaxError extends BaseError { constructor () { super('ajax') } start () { this.patch() } patch () { if (! XMLHttpRequest && ! window.ActiveXObject) { return } // patch const XHR = XMLHttpRequest || window.ActiveXObject const open = XHR.prototype.open let METHOD = '' let URL = '' try { XHR.prototype.open = function (method, METHOD = METHOD url = url open.call(this, METHOD, url) {// Save the request METHOD and request link. true) } } catch (err) { console.log(err) } const send = XHR.prototype.send const self = this XHR.prototype.send = Function (data = null) {let CURRENT_URL = URL try {this.addEventListener(' readyStatechange ', () => { if (this.readyState === 4) { if (this.status ! == 200 && this.status ! {// do not report its own error, If (urlwhitelist.some ((URL) => current_url.includes (url))) {return} const name = this.statustext const reponse = this.responseText const url = this.responseURL const status = this.status const withCredentials = this.withCredentials self.send({ name, reponse, url, status, withCredentials, data, method: METHOD }) } } }, false) send.call(this, data) } catch (err) { console.log(err) } } } }Copy the code

3.4 the fetch error

Here we hook the native fetch as well:

import BaseError from './base' export default class FetchError extends BaseError { constructor() { super('fetch') } start () { this.patch() } patch() { if (! window.fetch) { return null } let _fetch = fetch const self = this window.fetch = function() { const params = self.parseArgs(arguments) return _fetch .apply(this, arguments) .then(self.checkStatus) .catch(async (err) => { const { response } = err if (response) { const data = await response.text() self.send({ name: response.statusText, type: response.type, data, status: response.status, url: response.url, redirected: response.redirected, method: params.method, credentials: params.credentials, mode: params.mode }) } else { self.send({ name: err.message, method: params.method, credentials: params.credentials, mode: params.mode, url: params.url }) } return err }) } } checkStatus (response) { if (response.status >= 200 && response.status < 300) { return  response } else { var error = new Error(response.statusText) error.response = response throw error } } parseArgs (args)  { const parms = { method: 'GET', type: 'fetch', mode: 'cors', credentials: 'same-origin' } args = Array.prototype.slice.apply(args) if (! args || ! args.length) { return parms } try { if (args.length === 1) { if (typeof args[0] === 'string') { parms.url = args[0] } else if (typeof args[0] === 'object') { this.setParams(parms, args[0]) } } else { parms.url = args[0] this.setParams(parms, args[1]) } } catch (err) { throw err } finally { return parms } } setParams (params, newParams) { params.url = newParams.url || params.url params.method = newParams.method params.credentials = newParams.credentials || params.credentials params.mode = newParams.mode || params.mode return params } }Copy the code

4. Customize data report

Sometimes users need to monitor some data on their own pages, such as the start time of the player in live video, or the frame rate of the player. Based on this requirement, we simply extend a wave of SDKS:

// customReport.js import BaseReport from './baseReport' import throttle from 'lodash/throttle' import isEmpty from 'lodash/isEmpty' // For now, only value types are supported for reporting const defaultOptions = {type: 'number' } export default class CustomReport extends BaseReport { constructor (options = { delay: 5000 }) { super('custom'); this.skynetQuque = []; // The user may have reported it several times, so I made a security check, Elevate the data cached then unified enclosing sendToSkynetThrottled = throttle (this) sendToSkynet) bind (this), the options. The delay, {leading: false, trailing: true }) } upload (options = defaultOptions, data) { const { type } = options; If (type === 'number') {// This. UploadToSkynet (data); } } uploadToSkynet (data) { this.skynetLoop(data); SkynetLoop (data) {this.skynetque.push (this.formatSkyNetData (data));} // skynetLoop (data) {this.skynetque.push (this.formatSkyNetData (data)); Enclosing sendToSkynetThrottled (enclosing skynetQuque)} / / the data formatted into opentsdb report format formatSkynetData (data) {const {module, metric, tags, value } = data; const result = { metric: `frontMonitor.custom.${module}_${metric}`, endpoint: `${window.__HBI.id}`, counterType: "GAUGE", step: 1, value, timestamp: parseInt((new Date()).getTime() / 1000) }; if (! IsEmpty (tags)) {// If the tags are not empty, we need to do some conversion processing, Process to a string of the form k1=v1,k2=v2 result.tags = object.entries (tags).map(([key, The value]) = > ` ${key} = ${value} `). Join (', ')} return result} / / report data, Empty the queue and sendToSkynet (data) {this. Sender. DoSendToSkynet (data) enclosing skynetQuque = []}}Copy the code

In this way, developers can use the following code to report:

if (window.__CUSTOM_REPORT__) {
  const data = {
    module: 'player',
    metric: 'openTime',
    value: 100,
    tags: {
      browser: 'Chrome69',
      op: 'mac'
    }
  }

  c.upload({
    type: 'number'
  }, data)
}
Copy the code

What’s the problem

1. Report cross-domain problems

When each website references SDK, the address reported by SDK is fixed (specially used for data processing and not the same as the target website), so cross-domain problems will occur. Form form and IFrame can be combined to solve cross-domain problems:

class FormPost { postData (url, data) { let formId = this.getId('form'); let iframeId = this.getId('iframe'); let form = this.initForm(formId, iframeId, url, data); let ifr = this.initIframe(iframeId); return this.doPost(ifr, form); } doPost (ifr, form) { return new Promise(resolve => { let target = document.head || document.getElementsByTagName('head')[0]; ! target && (target = document.body); target.appendChild(form); target.appendChild(ifr); Ifr. Onload () = = > {/ / iframe after completion of loading unloading form and iframe form. ParentNode. RemoveChild (form); ifr.parentNode.removeChild(ifr); resolve(); } form.submit(); }); } getId (prefix) { ! prefix && (prefix = ''); return `${prefix}${new Date().getTime()}${parseInt(Math.random() * 10000)}`; } initForm (id, ifrId, url, data) { let fo = document.createElement('form'); fo.setAttribute('method', 'post'); fo.setAttribute('action', url); fo.setAttribute('id', id); fo.setAttribute('target', ifrId); // Load fo.style.display = 'none' in iframe; for (let k in data) { let d = data[k]; let inTag = document.createElement('input'); inTag.setAttribute('name', k); inTag.setAttribute('value', d); fo.appendChild(inTag); } return fo; } initIframe (id) { let ifr = (/MSIE (6|7|8)/).test(navigator.userAgent) ? document.createElement(`<iframe name="${id}">`) : document.createElement('iframe') ifr.setAttribute('id', id); ifr.setAttribute('name', id); ifr.style.display = 'none'; return ifr; } } export default new FormPost();Copy the code

2. Data collection dimension indicators explode

Since the opentSDB timing database is used, at the beginning of the design of the reported resource load data, I thought to set the URI to the resource name, and then put request, Response, size, parseSize and other information in the tags, and fill in the value with a random number. Only one data can be reported for each resource. This report can be reported normally, but because the tags store the value of the numeric type (there are too many specific values), resulting in the data combination explosion, the data can not be checked out.

Report data format before optimization:

{
    "metric": "frontMonitor.perf.resource_app.js",
    "value": 0,
    "endpoint": "3",
    "timestamp": 1539068028,
    "tags": "size=177062,parseSize=300,request=200,response=300,type=script,origin=huya.com,protocol=h2",
    "counterType": "GAUGE",
    "step": 1
}
Copy the code

So we have to set the URI to request, response, size, parseSize, etc., and store the resource name in tags, so that each resource has to report multiple data. While this increases the size of the report, it effectively lowers the dimensions so that the data can be retrieved quickly.

Optimized data format:

{
    "metric": "frontMonitor.perf.resource_size",
    "value": 177062,
    "endpoint": "3",
    "timestamp": 1539068028,
    "tags": "name=app.js,type=script,origin=huya.com,protocol=h2",
    "counterType": "GAUGE",
    "step": 1
}
Copy the code

3. The number of concurrent reports is large

Consider that if the system is connected to a large number of users in the website, it will encounter the situation of receiving more than one second of data. In this case, Opentsdb has an overwrite problem because if all other fields except value are the same, opentsdb will overwrite the previous data in the last second. One solution was to add a unique field to the tags field and use some simple algorithms to make it go to a unique value to solve the overwrite problem.

This is not perfect for two main reasons. First, the graph will have multiple Y values at the same point on the X-axis, so you have to adapt to the graph and aggregate the data on the front end (which increases server-side stress). The second reason is that the amount of data is too large, which will cause pressure on the server and slow down the query efficiency. Therefore, kafak is used to do queue processing, and the data is merged into the minute dimension and reported to Opentsdb. In this way, the coverage problem is solved, the server pressure can be reduced and the query efficiency can be improved.

4. Deployment pit

4.1 Front-end Construction

Since the project is published through the company’s unified publishing system, and the backend uses egg framework, it is necessary to build the front-end project into the app/public folder of the back-end project first:

That is, the front-end construction project needs to be modified into the back-end project app/public:

4.2 Back-end Building

As egg + typescript is used, we need an additional step to compile the TSC code into JS. Otherwise, we will get an error. Here is the build script command:

"scripts": { "start": "egg-scripts start --daemon --title=egg-server-monitor-backend --port=8088", "stop": "egg-scripts stop --title=egg-server-monitor-backend --port=8088", "dev": "egg-bin dev -r egg-ts-helper/register --port=8088", "debug": "egg-bin debug -r egg-ts-helper/register", "test-local": "egg-bin test -r egg-ts-helper/register", "test": "npm run lint -- --fix && npm run test-local", "cov": "egg-bin cov -r egg-ts-helper/register", "tsc": "ets && tsc -p tsconfig.json", "ci": "npm run lint && npm run cov && npm run tsc", "autod": "autod", "lint": "tslint --project . -c tslint.json", "clean": "ets clean", "pack": "npm run tsc && rm -rf ./node_modules && npm i --production && tar -zcvf .. /ROOT.tgz ./ && npm run reDevEnv && npm run clean", "reDevEnv": "rm -rf ./node_modules && npm i", "zip": "node ./zip.js" }Copy the code

When we build, we use the pack directive, which is to use NPM Run Pack or YARN Run Pack, NPM run TSC && rm -rf. /node_modules && NPM I –production && tar -zcvf.. / root.tgz./ && NPM run reDevEnv && NPM run clean. Executing this instruction takes place in the following steps:

  • Use firsttscCompiled intojsCode;
  • deletenode_modulesCode;
  • Install the production environmentnode_modulesCode;
  • Compress the project into.tgzFormat;
  • deletenode_modulesCode;
  • To reinstall the development environmentnode_modulesCode;
  • deletetscCompiled into thejsCode;

4.3 Back-end Using Front-end static Resources

Since it is a front-and-back split project and does not use the template functionality provided by egg, it is necessary to write a piece of middleware. Since egg is written based on KOA, some of KOA’s middleware is also available to specify the page referenced when accessing the route:

// kstatic.ts import * as KoaStatic from 'koa-static'; import * as path from 'path'; Export default (options) => {// Use koa-static middleware return KoaStatic(path.join(__dirname, '.. /public'), options); };Copy the code

Add config. Middleware = [‘kstatic’] to config/config.default.ts

4.4 Repairing route Pointing

Because the front-end page uses react-router-DOM and history mode, pages and JS files can be loaded normally when accessing the root page. However, when we need to access secondary or tertiary routes or refresh the page, such as xxx.huya.com/test/100, There may be js loading failure, resulting in page rendering failure.

So we need to fix the access path of these local static resources, and make them look up from the root directory when they are accessed, so we add another middleware:

// historyApiFaalback.ts import * as url from 'url'; export default (options) => { return function historyApiFallback(ctx, next) { options.logger = ctx.logger; const logger = getLogger(options); logger.info(ctx.url); // Skip if (ctx.method! == 'GET' || ! ctx.accepts(options.accepts || 'html')) { return next(); } const parsedUrl = url.parse(ctx.url); let rewriteTarget; options.rewrites = options.rewrites || []; For (let I = 0; i < options.rewrites.length; i++) { const rewrite = options.rewrites[i]; let match; if (parsedUrl && parsedUrl.pathname) { match = parsedUrl.pathname.match(rewrite.from); } else { match = ''; } if (match ! == null) { rewriteTarget = evaluateRewriteRule(parsedUrl, match, rewrite.to, ctx); ctx.url = rewriteTarget; return next(); } } const pathname = parsedUrl.pathname; if ( pathname && pathname.lastIndexOf('.') > pathname.lastIndexOf('/') && options.disableDotRule ! == true ) { return next(); } rewriteTarget = options.index || '/index.html'; logger('Rewriting', ctx.method, ctx.url, 'to', rewriteTarget); ctx.url = rewriteTarget; return next(); }; }; function evaluateRewriteRule(parsedUrl, match, rule, ctx) { if (typeof rule === 'string') { return rule; } else if (typeof rule ! == 'function') { throw new Error('Rewrite rule can only be of type string or function.'); } return rule({ parsedUrl, match, ctx }); } function getLogger(_options) { if (_options && _options.verbose) { return console.log.bind(console); } else if (_options && _options.logger) { return _options.logger; }}Copy the code

Then add config.middleware = [‘historyApiFallback’, ‘kstatic’] to the previous middleware code in config/config.default.ts; In order.

And add the option code:

config.historyApiFallback = {
  ignore: [/.*\..+$/, /api.*/],
  rewrites: [{ from: /.*/, to: '/' }]
};
Copy the code

5sdk version release management

At first, for convenience, the compiled SDK was dumped directly into the CDN, and then systems referenced the script directly. However, there are two main reasons for this high risk. The first one is that if the SDK is uploaded to THE CDN without adequate testing, all systems will be affected if there are bugs in the SDK. The second point is that different systems have different functional requirements for SDK, so it is difficult to maintain the same SDK. Considering these two points, the SDK version release management function is made, the following is the specific process;

5.1 SDK Compilation:

Get the current latest version number from the service and update a version number; Build multiple entry, according to the function module to cut the SDK into multiple files, such as: sdK.perf. js and sdK.error. Js (performance monitoring, error monitoring respectively). Then combine several files into one file, and add the cut character between each module for the subsequent separation of SDK;

const axios = require('axios') const webpack = require('webpack') const webpackConfig = require('.. /webpack.config.prod.js') const fs = require('fs') const path = require('path') const OUTPUT_DIR = '.. /dist/' const resolve = (dir) => path.join(__dirname, OUTPUT_DIR, dir) const combineFiles = (bases, error, ForEach ((file) => {data += fs.readfilesync (resolve(file))) Fs.unlinksync (resolve(file))}) Data += '/* hbi-sdK-error-monitor */' data += fs.readfilesync (resolve(ERROR)) fs.unlinksync (resolve(ERROR)) Fs.writefilesync (resolve(target), data)} async function build () { New update version number const version = await axios. Get (' https://api.b1anker.com/api/v0/systemVariable/list?name=SDK_VERSION ') .then(({data: { data }}) => { return data[0].value; }); webpack(webpackConfig({ version }), (err, Stats) = > {the if (err | | stats. HasErrors ()) {console. Error (' build failures) throw err} else {/ / merge combineFiles SDK module ([ 'hbi.vendor.js', 'hbi.mons.js ', 'hbi.performance. Js'], 'hbi.error. Js ', 'hbi.js') console.error(' Build successful: v' + version); }}); } build()Copy the code

5.2 SDK Upload:

Upload the SDK to the local server and obtain the corresponding version for subsequent operations. The SDK upload operation should be manually operated by people, so that the corresponding information can be recorded, so as to be rolled back when there is a problem or demand:

And multi-system releases:

When publishing, the backend will find out the SDK of the corresponding version locally and the CORRESPONDING SDK configuration of the system, so as to determine what functions to configure for the SDK, that is, to cut the SDK; When generating the corresponding SDK, name the SDK with the flag of the project (such as b1anker.sdk.js), so that the SDK release will only apply to the system using this flag;

Export default class SDK extends Service {// Release SDK public async pulishSDK (projects: string[], version: string) { const success: any[] = []; const error: any[] = []; for (let i = 0; i < projects.length; i++) { const id: number = Number(projects[i]); Try {/ / project relevant information const {flag} = await this. Service. Project. GetProject (id); // Generate corresponding SDK await this.uploadSDKToCDN(Flag, version) according to project flag and SDK version; / / upload to CDN await this. Service. The SDK. UpdateSdkInfo (id, version); success.push(id); } catch (err) { error.push(id); this.logger.error(err); } } return { success, error }; } public async uploadSDKToCDN (flag: string, version: String) {const error = await this.app.mysql.query(' select error from project a inner join ') {const error = await this.app.mysql.query(' select error from project a inner join project_sdk_setting b where a.id = b.pid and a.flag = '${flag}'; ') // Error monitoring is disabled by default. Parse (error[0].error).length) {enableError = true; } } catch (err) { throw err; } const sdkPath = path.join(os.homedir(), 'sdk', `b1anker-${version}.js`); const cdnPath = `b1anker/${flag}.sdk.js`; / / according to the project of the SDK configuration to generate the final SDK if (enableError) {/ / not open fault monitoring, can be directly uploaded to amend the name CDN await this. Service. Util. UploadFileToCdn (sdkPath, cdnPath); } else { const sdkData = fs.readFileSync(sdkPath).toString(); Const withoutErrorMonitor = sdkdata.split ('/* hbi-sdK-error-monitor */')[0]; // Split SDK const withoutErrorMonitor = sdkData.split('/* hbi-sdK-error-monitor ')[0]; / / upload CDN await this. Service. Util. UploadBufferToCdn (cdnPath, new Buffer (withoutErrorMonitor)); }}}Copy the code

conclusion

Through this project, I have been exposed to a lot of knowledge outside the front end, such as system conception, prototype design, back-end logic processing, mysql relational database, Opentsdb timing database, Kafak message queue, etc., which has enabled me to have a clear understanding of a complete system and better understand the bottlenecks of different technologies. In particular, focus on the front and back ends. It also expanded its front-end stack and gained some understanding of React.

} return {front-end learning training, video tutorials, learning routes, add weixin Kaixin666Haoyun}; }Copy the code