Front-end monitoring includes behavior monitoring, exception monitoring, performance monitoring, etc. This paper mainly discusses exception monitoring. For the front-end and back-end in the same monitoring system, the front have their own monitoring solution, a back-end have their own monitoring solution, but they are not separated, because a user in the process of operation application if abnormal, may be due to a front end, also may be due to a back end, need to have a mechanism that will together before and after the end, Make the monitoring itself unified in the monitoring system. Therefore, even if we only discuss front-end exception monitoring, we cannot strictly distinguish between the front and back ends. Instead, we need to reflect the help of monitoring for development and business in the final report according to the actual system design.
Generally speaking, a monitoring system can be roughly divided into four stages: log collection, log storage, statistics and analysis, report and warning.
Collection phase: Collects abnormal logs, handles them locally, and reports them to the server using a specific solution.
Storage phase: The back-end end receives the abnormal logs reported by the front-end end, processes them, and stores them according to certain storage schemes.
Analysis stage: divided into automatic machine analysis and manual analysis. The machine automatically analyzes and filters the stored log information according to preset conditions and algorithms to discover problems and trigger an alarm. Manual analysis provides a visual data panel so that system users can see specific log data and find the root cause of abnormal problems based on the information.
Alarm stage: divided into alarm and early warning. Alarms are automatically generated based on certain levels, through specified channels, and according to certain triggering rules. Early warning is in the abnormal before the occurrence of pre-judgment, give a warning.
1 Front-end Exception
Front-end exceptions refer to situations in which users cannot quickly obtain expected results when using Web applications. Different exceptions may cause different consequences, ranging from unhappy users to unusable products and loss of users’ recognition of the products.
1.1 Front-end anomaly classification
According to the extent of the consequences of the exception code, the performance of front-end exceptions can be divided into the following categories
A. an error
The content displayed on the interface is inconsistent with the expected content of the user. For example, when the user clicks to enter the non-target interface, the data is inaccurate, the error message is incomprehensible, the interface is misplaced, and the user jumps to the wrong interface after submitting the interface. This type of exception occurs when the product itself is functional, but the user is unable to achieve their goals.
B. dull
The interface does not respond after an operation. For example, the operation cannot be continued after a button is clicked. This type of exception occurs when the product is already partially unavailable at the interface level.
Damage of c.
The operation purpose cannot be achieved, for example, the target interface cannot be entered or details cannot be viewed. When this type of exception occurs, some application functions cannot be used normally.
D. feign death
The interface is stuck and no function can be used. For example, users cannot log in and use in-app functions, or they cannot perform any subsequent operations because a mask layer blocks and cannot be closed. When such exceptions occur, the user is likely to kill the application.
E. collapse
Applications frequently exit automatically or fail to operate. For example, intermittent crash means that a web page cannot be loaded or any operation cannot be performed after loading. This kind of abnormal continue to appear, will directly lead to the loss of users, affect the vitality of the product.
1.2 Classification of abnormal error causes
The causes of front-end anomalies are classified into five categories:
why | case | frequency |
Logical error | 1) Errors in business logic judgment conditions 2) Wrong binding sequence of events 3) Call stack timing error 4) Incorrect manipulation of js objects |
often |
Data type error | 1) Read property from NULL as an object 2) Traverse undefined as an array 3) Use string numbers directly in addition operations 4) Function parameters are not transmitted |
often |
Syntactic error | less | |
Network error | 1) slow 2) The server does not return data but still 200, so the front end performs data traversal as normal 3) Network interruption during data submission 4) The front-end does not do any error processing when the server 500 error occurs |
Once in a while |
System error | 1) The memory is not enough 2) The disk is full 3) The shell does not support apis 4) are not compatible |
less |
2 Abnormal Collection
2.1 Collection Content
When an exception occurs, we need to know the specific information of the exception and decide what solution to adopt according to the specific information of the exception. When collecting exception information, you can follow the 4W rule:
WHO did WHAT and get WHICH exception in WHICH environment?
A. User information
Information about the user when an exception occurs, for example, the status and permission of the user at the current time, and which terminal corresponds to the exception when the user can log in from multiple terminals.
B. Behavioral information
An exception occurs when a user performs the following operations: Interface path; What operation was performed; What data is used in the operation; What data did the API spit out to the client? If it is a commit operation, what data is submitted; A previous path; Last behavior log ID, etc.
C. Exception information
The code that generates the exception: the DOM element node that the user operates on; Exception level; Exception type; Exception description; Code stack information, etc.
D. Environment information
Network environment; Device model and ID; Operating system version; Client version; API interface version.
field | type | explain |
requestId | String | An interface generates a requestId |
traceId | String | A phase generates a traceId that traces all log records associated with an exception |
hash | String | The unique identifier of this log, equivalent to a logId, is generated based on the specific contents of the current log record |
time | Number | Current log generation time (save time) |
userId | String | |
userStatus | Number | At that time, user status information (whether available/disabled) |
userRoles | Array | At that time, the former user’s role list |
userGroups | Array | At that time, the user is currently in a group, and group permissions may affect the results |
userLicenses | Array | At that time, the license may expire |
path | String | Path, URL |
action | String | What was done |
referer | String | Previous path, source URL |
prevAction | String | Previous operation |
data | Object | State and data of the current interface |
dataSources | Array<Object> | What data does the upstream API give you |
dataSend | Object | What data was submitted |
targetElement | HTMLElement | The DOM element that the user manipulates |
targetDOMPath | Array<HTMLElement> | The node path of the DOM element |
targetCSS | Object | A custom style sheet for this element |
targetAttrs | Object | The current attributes and values of the element |
errorType | String | Wrong type |
errorLevel | String | Abnormal level |
errorStack | String | Error Stack message |
errorFilename | String | Wrong file |
errorLineNo | Number | Wrong line |
errorColNo | Number | Error column position |
errorMessage | String | Misdescription (developer definition) |
errorTimeStamp | Number | The time stamp |
eventType | String | The event type |
pageX | Number | X coordinates of the event |
pageY | Number | The Y-axis of the event |
screenX | Number | X coordinates of the event |
screenY | Number | The Y-axis of the event |
pageW | Number | The page width |
pageH | Number | Height of the page |
screenW | Number | The width of the screen |
screenH | Number | The screen height |
eventKey | String | The key that triggers the event |
network | String | Network Environment Description |
userAgent | String | Client Description |
device | String | Device description |
system | String | Operating System Description |
appVersion | String | Application version |
apiVersion | String | Interface version |
This is a very large table of log fields that contains almost all the information that can be used to describe the circumstances surrounding an exception when it occurs. These fields may not always be collected in different cases, and since we use a document database to store logs, it does not affect its actual stored results.
2.2 Exception Catching
Front-end catch exception is divided into global catch and single point catch. Global capture code set, easy to manage; As a supplement, single point capture captures some special cases, but it is scattered and not conducive to management.
A. Global capture
Capture code is centralized in one place with a global interface. The interfaces available are:
- Window. The addEventListener (” error “)/window. AddEventListener (” unhandledrejection “)/document. AddEventListener (” click “), etc
- Framed-level global listening. For example, the AIXOS uses the Interceptor for intercepting. Vue and React have their own error collecting interfaces
- The global function is wrapped to automatically catch exceptions when the function is called
- Example method rewrite (Patch) to cover the original function, such as console.error rewrite, can also catch exceptions while using the same method
B. Single point capture
Wrap a single block of code in business code or dot a logical flow for targeted exception catching:
- The try… catch
- Write a special function to collect exception information and call it when an exception occurs
- Write a special function to wrap around other functions, resulting in a new function that behaves exactly like the original function, except that it can catch exceptions when they occur
2.3 Cross-domain Script Exception
Due to the security policy of the browser, when a cross-domain Error is generated, the detailed information about the Error cannot be directly obtained, and only a Script Error can be obtained. For example, we might introduce third-party dependencies or put our own scripts on the CDN.
Script Error:
Solution a:
- Inline JS into HTML
- Put the JS file and HTML in the same domain
Scheme 2:
- Adds the Crossorigin attribute to the script tag on the page
- Add access-Control-allow-Origin to the server response header where the script resides to support cross-domain resource sharing
2.4 Abnormal Recording
For an exception, simply having information about it is not enough to fully capture the nature of the problem, because the location where the exception occurred is not necessarily where the exception originated. We need to restore the abnormal scene to recover the full picture of the problem, and even avoid similar problems in other interfaces. One concept that needs to be introduced here is “exception recording”. Recording records the entire process from before and after the occurrence of an exception in time and space, which is more helpful for finding the root cause of the exception.
The figure above shows that when an anomaly occurs, the source of the anomaly may be far away from us. We need to go back to the scene of the anomaly and find the source of the anomaly. Just like in real life, it’s easier to solve a crime if you have a surveillance camera recording of the crime. Finding the root cause of an exception requires luck if you focus only on the exception itself, but with the help of exception recording, finding the root cause is easier.
Were, in fact, so-called “exception to record” by technical means, to collect user’s operation process, for the user to keep a record of every operation when an exception occurs, the record of a certain time interval operation, image formation, don’t need to ask the user to debug, users can see the time of the operation process.
The figure above is a schematic diagram of abnormal recording and restoration scheme from Ali. Events and mutation generated by user operations on the interface were collected by the product, uploaded to the server, and stored in the database in sequence after queue processing. When it is necessary to reproduce the exception, these records are taken out from the database, and the sequence of these records is played by using certain technical schemes to achieve the exception restoration.
2.5 Exception Level
In general, we divide the levels of information collected into INFO, WARN, ERROR, and so on, and expand from there.
When we monitor the occurrence of an exception, we can classify the exception into A, B, C, and D levels in the “importance-critical” model. Some exceptions occur, but do not affect the normal use of users, users are not aware of it, although theoretically should be fixed, but in fact, compared with other exceptions, can be dealt with later.
The alarm policies are described in the following sections. Generally, alarms that are closer to the upper right corner are notified sooner, ensuring that related personnel can receive the information and handle the alarms as soon as possible. Class A exceptions require quick response and even the knowledge of the person in charge.
In the exception collection phase, determine the severity of an exception according to the consequences of the exception defined in Section 1, and select a corresponding report scheme when an exception occurs.
3 Sorting and reporting scheme
As mentioned above, in addition to the error message itself, we also need to record user operation logs to achieve scenario recovery. This involves the amount and frequency of reporting. If any logs are reported immediately, this is no different from a self-created DDOS attack. Therefore, we need a reasonable reporting plan. The following describes the four reporting schemes. In fact, we often use different reporting schemes for different levels of logs.
3.1 Front-end Log Storage
As mentioned earlier, we collect not only logs of the exceptions themselves, but also logs of user behavior related to the exceptions. A single exception log cannot help us quickly locate the root cause of the problem and find a solution. However, if you want to collect user behavior logs, you need to use certain skills, rather than immediately upload the behavior logs to the server after each operation. For applications with a large number of online users, uploading the logs as soon as the operation is performed is equivalent to a DDOS attack on the log server. Therefore, we store these logs locally on the user client, and when certain conditions are met, we package and upload a set of logs at the same time.
So how do you do front-end log storage? We couldn’t just store these logs in a variable, we would run out of memory, and the logs would be lost once the user refreshed them, so front-end data persistence was a natural choice.
Currently, there are a lot of persistence options available, including Cookie, localStorage, sessionStorage, IndexedDB, webSQL, FileSystem, and so on. So how to choose? Let’s use a table for comparison:
storage | cookie | localStorage | sessionStorage | IndexedDB | webSQL | FileSystem |
type | key-value | key-value | NoSQL | SQL | ||
The data format | string | string | string | object | ||
capacity | 4k | 5M | 5M | 500M | 60M | |
process | synchronous | synchronous | synchronous | asynchronous | asynchronous | |
retrieve | key | key | key, index | field | ||
performance | Read quickly write slowly | Read slowly write faster |
After all, IndexedDB was the best choice because of its large volume and asynchronous nature, which ensured that it would not block the rendering of the interface. IndexedDB is a separate database, and each database is divided into stores. It can also be queried according to the index. It has a complete database management thinking and is more suitable for structured data management than localStorage. But it has a disadvantage, is that the API is very complex, not as simple and straightforward as localStorage. For this purpose, we can use the hello-IndexedDB tool, which uses Promise to encapsulate complex apis and simplify operations, making indexedDB as easy to use as localStorage. In addition, IndexedDB is a widely supported HTML5 standard that is compatible with most browsers, so there is no need to worry about its future.
Next, how can we use IndexedDB properly to ensure that our front-end storage is sound?
The figure above shows the front-end log storage process and database layout. When an event, change, or exception is captured, an initial log is created and immediately put into staging (a store of indexedDB), after which the main program finishes the collection process and the rest only happens in the WebWorker. In a Webworker, a circular task constantly takes out logs from the temporary storage area, classifies the logs, stores the classification results in the index area, enrichis the log information, and finally transfers the log records of the reporting server to the archive area. After a log has been in the archive for more than a certain number of days, it is no longer valuable, but in case of special circumstances, it is saved to the recycle area, and after a period of time, it is removed from the recycle area.
3.2 Front-end Log Collating
As mentioned above, what is the process of collating logs in a Webworker and storing them in the index area and archive area?
As we will talk about the following report, is in accordance with the index, therefore, we in the front-end log sorting work, mainly according to the log characteristics, sorting out different indexes. When collecting logs, we will mark each log with a type to classify and create indexes. Meanwhile, we will calculate the hash value of each log object through object-hashcode as the unique identifier of this log.
- Keep all log records in the archive area in order and index the newly added logs
- BatchIndexes: Reports indexes (including performance logs) in batches. A maximum of 100 indexes can be reported at a time
- MomentIndexes, all reported at once
- FeedbackIndexes: user FeedbackIndexes, one at a time
- BlockIndexes: BlockIndexes that are divided into exceptions or errors (traceId, requestId) and are reported one at a time
- After the log is reported, the index corresponding to the log is deleted
- Logs more than three days old are collected
- Logs generated more than 7 days ago are cleared from the recovery area
RquestId: tracks logs of both the front and back ends. Because the backend also records its own logs, requestId is added by default when the front-end requests the API, so that the logs recorded by the back-end can be matched with the front-end logs.
TraceId: Traces logs before and after an exception occurs. When the application is started, a traceId is created until an exception occurs and the traceId is refreshed. Collect a traceId related requestId and combine all the requesTId-related logs, which are used to recover the exception.
The figure above illustrates how to use traceId and requestId to find all logs associated with an exception. In the log list, there are two records with the traceId2. However, hash3 is not the start of an action, because the requestId corresponding to hash3 is reqId2. ReqId2 starts with hash2, so we’re actually going to add hash2 to the entire backup record where this exception occurred. To sum up, we need to find all the log records for requestids corresponding to the same traceId. It’s a little bit convoluted, but you can understand it a little bit.
We gather all the logs related to an exception, called a block, and then use the hash set of logs to get the hash of the block, and build the index in the index area, waiting for the report.
3.3 Reporting Logs
Log reporting is also carried out in webworker. In order to distinguish it from collation, it can be divided into two workers. The reporting process is as follows: In each cycle, the corresponding index is retrieved from the index area, the complete log record is retrieved from the archive area through the hash in the index, and then uploaded to the server.
Reports can be divided into four types according to their frequency (critical urgency) :
A. Immediate reporting
The function is triggered immediately after logs are collected. Only used for class A exceptions. In addition, due to network uncertainties, type A log reporting requires an acknowledgement mechanism. Type A logs are reported only after the server has successfully received the information. Otherwise, a loop mechanism is required to ensure successful reporting.
B. Batch report
The collected logs are stored locally. After a certain number of logs are collected, the logs are packaged for a one-time report or uploaded at a certain frequency (interval). This is equivalent to merging multiple reports into a single report to reduce the strain on the server.
C. Block reporting
An exception scenario is packaged into a block and reported. Different from batch reporting, batch reporting ensures log integrity and comprehensiveness, but contains useless information. Block reporting is for the exception itself, ensuring that all logs related to a single exception are reported.
D. Users voluntarily submit
Provide a button on the interface, users actively feedback bugs. This helps to enhance interaction with users.
If an exception occurs that has no impact on users but is detected by the application, a dialog box is displayed asking users to choose whether to upload logs. This scheme is suitable for involving users’ private data.
Real-time reporting | The batch report | Block reported | User feedback | |
aging | immediately | timing | A little delay | Time delay |
A number of | All at once | Article 100 at a time | Report related items once | Article 1 at a time |
capacity | small | In the | – | – |
emergency | Emergency important | Not urgent | Not urgent but important | Not urgent |
Although real-time reporting is called real-time, it is actually completed through circular tasks similar to queues. It is mainly to submit some important exceptions to the monitoring system as soon as possible, so that o&M personnel can find problems. Therefore, it corresponds to a high degree of urgency.
The difference between batch reporting and block reporting is as follows: A certain number of reports are reported once, for example, 1000 reports are reported every two minutes until the report is complete. And block report is after the exception occurs, immediately to collect all the logs, and exception report query which has the batch report, out of the other related log to upload, and abnormal related these logs are relatively more important, they can help as soon as possible recover abnormal field, find out the root cause of an exception occurs.
Feedback from users can be reported slowly.
To ensure successful reporting, an acknowledgement mechanism is required. After the server receives reported logs, it does not immediately save them to the database, but puts them into a queue. Therefore, the front and back ends need to perform some processing to ensure that the logs have been recorded into the database.
The following figure shows the general process of reporting logs. Before reporting logs, the client uses hash query to check whether the logs to be reported have been saved by the server. If the logs have been saved, the client removes the logs to avoid repeated reporting and waste traffic.
3.4 Compressing reported Data
When a batch of data is uploaded at a time, the data volume is large, traffic is wasted, transmission is slow, and the network is poor. As a result, the report may fail. Therefore, data compression before reporting is also a solution.
In this case, the amount of data at a time may be more than ten K. For sites with large daily PV, the flow generated is still considerable. Therefore, it is necessary to compress and report the data. Lz-string is a very good string compression class library, good compatibility, less code, high compression ratio, short compression time, compression rate up to an amazing 60%. However, it is based on LZ78 compression. If the back end does not support decompression, you can choose Gzip compression. Generally, the back end is preinstalled with GZIP by default.
4 Receiving and storing logs
4.1 Access Layer and Message Queue
Generally, an independent log server is provided to receive client logs. During receiving logs, the validity and security of client logs are screened to prevent attacks. And because log submissions tend to be frequent, concurrency among multiple clients is common. One by one, log information is processed by message queue and then written to the database for storage.
The “access layer” and “push center” are the access layers and message queues mentioned here. BetterJS separates the front-end monitoring modules. The push center pushes logs to the storage center for storage and to other systems (such as alarm systems). However, the queue for receiving logs can be separated to make a transition between the access layer and the storage layer.
4.2 Log Storage System
Storing logs is a dirty job, but it has to be done. For small applications, single library, single table plus optimization can cope. A large scale application that wants to provide a more standard and efficient log monitoring service often needs to put some effort into the log storage architecture. At present, there are complete log storage solutions in the industry, including Hbase, Dremel, and Lucene. In general, log storage systems are faced with large data volume, irregular data structure, high write concurrency, and large query requirements. General a set of log storage system, to solve the above problems, it is necessary to solve the buffer of writing, storage media according to the log time selection, to facilitate rapid reading and design a reasonable index system and so on.
Since the log storage system solution is relatively mature, we will not discuss it further here.
4.3 the search
The ultimate purpose of logging is to use, because the volume of logs is generally very large, so in order to find the required log records in the huge data, need to rely on a good search engine. Splunk is a full-fledged log storage system, but it’s paid for. Elk is the open source implementation of Splunk. Elk is a combination of ElasticSearch, Logstash and Kibana. ES is a search engine based on Lucene’s storage and index. Logstash is a standardized logging pipeline that provides input/output and conversion processing plug-ins. Kibana provides a user interface for visualization and query statistics.
5 Log statistics and analysis
A comprehensive log statistical analysis tool needs to provide a convenient panel from all aspects to provide feedback to log administrators and developers in a visual manner.
5.1 User Latitude
Different requests from the same user actually form different story lines, so it is necessary to design a unique request ID for a series of user actions. Operations performed by the same user on different terminals can also be distinguished. The status and permissions of a user during an operation also need to be reflected in the log system.
5.2 Time Dimension
How an exception occurs is observed by connecting the story lines before and after the exception operation. It does not involve a single action by a user, or even a single page, but is the end result of a series of events.
5.3 Performance Dimensions
The performance of the application while it is running, for example, interface load time, API request time statistics, cell count consumption, user lag time.
5.4 Operating Environment Dimensions
Application and service running environment, such as the application network environment, operating system, device hardware information, server CPU and memory status, network and broadband usage.
5.4 Fine-grained code tracing
Exception code stack information, locate the exception code location and exception stack.
5.6 Scenario Tracing back
By concatenating user logs related to exceptions, the process of the occurrence of an exception can be output dynamically.
6 Monitoring and Notification
Statistics and analysis of exceptions are just the basis, and when exceptions are found, they can be pushed and alarms, or even handled automatically, which is the ability of an exception monitoring system.
6.1 Customizing The Alarm Triggering Conditions
A. Monitoring implementation
When logs enter the access layer, the monitoring logic is triggered. If a higher-level exception exists in the log information, the alarm can be generated immediately. The alarm message queue and log entry queue can be managed separately in parallel.
Collect statistics on incoming logs and generate alarms for abnormal information. Respond to monitoring exceptions. The so-called monitoring anomaly refers to: the regular anomaly is generally more reassuring, more troublesome is the sudden anomaly. For example, level D anomalies are suddenly and frequently received in a certain period of time. Although Level D anomalies are not urgent, it is important to be vigilant when the monitoring itself is abnormal.
B. Customize trigger conditions
In addition to the default alarm conditions configured during system development, you should also provide the log administrator with custom trigger conditions that can be configured.
- What does the log contain
- What degree and amount of time does log statistics reach
- What conditions does the alarm meet
6.2 Push Channel
There are many ways to choose, such as email, SMS, wechat, telephone.
6.3 Push Frequency
You can also set the alarm push frequency for different severities. Low-risk alarms can be reported once a day. High-risk alarms can be reported in a 10-minute cycle until the handler manually disables the alarm switch.
6.4 Automatic Reports
To push log statistics, you can automatically generate daily, weekly, monthly, and annual reports and send them to related groups by email.
6.5 Generate bug work orders automatically
When an exception occurs, the system can use the work order system API to automatically generate bug orders. After the work order is closed, it will be fed back to the monitoring system to record the tracking information of exception handling and display it in the report.
7 Troubleshooting Exceptions
7.1 sourcemap
In most cases, the front-end code is released after compression, and the stack information reported needs to be restored to the source information, so as to quickly locate the source code for modification.
When publishing, only js script is deployed to the server, and the Sourcemap file is uploaded to the monitoring system. When stack information is displayed in the monitoring system, the sourcemap file is used to decode the stack information and obtain the specific information in the source code.
The catch is that sourcemap must correspond to the version of the official environment, as well as to a commit node in Git, so that stack information can be used correctly to find the offending version of code when looking for exceptions. This can be accomplished by establishing CI tasks and adding a deployment process to the integrated deployment.
7.2 From Alarm to Warning
The essence of early warning is to preset abnormal conditions that may occur. When the condition is triggered, the abnormal does not really happen. Therefore, the user behavior can be checked before the occurrence of the abnormal, and repaired in time to avoid the abnormal or abnormal expansion.
So how do we do that? It’s a process of statistical clustering. Make statistics of abnormal situations in history and find out rules from different dimensions such as time, region and user, and automatically add these rules into the warning conditions through the algorithm. When the next trigger occurs, the warning will be timely.
7.3 Intelligent Recovery
Automatically fixes errors. For example, if the front end asks the interface to return a numeric value, but the interface returns a numeric string, there could be a mechanism to monitor the system to send the correct data type model to the back end, and the back end controls the type of each field based on that model when it returns data.
8 Exception Test
8.1 Active exception test
Write exception use cases and add exception test users to the automated test system. Whenever an exception is found during a test or run, it is added to the original list of exception use cases.
8.2 Random exception test
Simulate the real environment, simulate the random operation of real users in the simulator, use automatic script to generate random operation action code, and execute it.
Define an exception, such as a pop-up box that contains specific content. Recording these test results and clustering statistical analysis is also helpful in preventing anomalies.
9 the deployment
9.1 Multiple Clients
A user logs in to different terminals, or the status of a user before and after login. A requestID is generated using a specific algorithm. By using the requestID, you can determine a series of operations performed by a user on an independent client. Based on the log sequence, you can figure out the specific path of user exceptions.
9.2 Integration convenience
The front-end is written as a package, and global references do most of the logging, storage, and reporting. In special logic, specific methods can be called to log.
The backend is decoupled from the business code of the application itself and can be an independent service that interacts with third-party applications through interfaces. With integrated deployment, the system can be expanded and migrated at any time.
9.3 Scalability of the Management System
The whole system can be extended, not only to serve a single application, but also to support multiple applications running at the same time. All applications on the same team can be managed on the same platform.
9.4 Log System Permission
Different people have different permissions when accessing the log system. A visitor can only view his or her own application. If some statistics are sensitive, you can set permissions separately to desensitize sensitive data.
Ten other
10.1 Performance Monitoring
Exception monitoring focuses on code level errors, but you should also focus on performance exceptions. Performance monitoring includes:
- Runtime performance: file level, module level, function level, algorithm level
- Network request rate
- The system performance
10.2 API Monitor
Back-end API also has a great impact on the front end. Although the front-end code also controls logic, the data returned by the back end is the basis, so the monitoring of THE API can be divided into:
- Stability monitoring
- Data format and type
- An error monitoring
- Data accuracy monitoring
10.3 Data desensitization
Sensitive data is not collected by the log system. Although the data in the log system is very important, most of the log system is not confidential. Therefore, if the application involves sensitive data, it is best to do the following:
- It is deployed independently and does not share the monitoring system with other applications
- Only user operation data is collected instead of specific data. When the data is reproduced, data API results can be retrieved through log information for display
- Log encryption protects software and hardware encryption
- When necessary, ids of specific data can be collected for debugging and replaced with mock data when the scenario is reproduced, which can be generated at the back end using a fake data source
- Obfuscation of sensitive data
conclusion
This paper mainly studied the front-end abnormal monitoring the overall framework of, did not involve the specific technical implementation, involving the front part and back part and the whole issue related knowledge, focus on the front part, it is overlap and back-end control also has the branch part, in a project need to continue to practice, summed up the monitoring requirements and strategy of the project itself.