preface

In the daily development process, there are two common pain points in the face of online problems. One is that the user operates on his own machine and the developer cannot restore the scene of the user triggering the exception. The other is that the stack information printed on the console due to the confusion and compression of the code leads to the online error and the anomaly cannot be located to the source code. This time we will discuss the solution of abnormal video playback.

The core protagonist snapshot generation and playback library RRWeb

The following I want to introduce today’s protagonist RRWeb framework, full name record and replay the Web. It consists of three libraries:

rrweb-snapshot

Turn the DOM in the page into a serializable data structure

rrweb

Provides recording and playback API

rrweb-player

Rrweb converts all THE DOM elements in the page into document data and assigns each DOM element a unique ID when the page is refreshed. Later, when the page changes, only the changed DOM elements are serialized. When the page is replayed, the data will be deserialized and inserted into the page, and the original incremental DOM changes, such as attribute or text changes, will be found according to the ID of the corresponding DOM element modification; Dom changes are made when child nodes are added or subtracted based on the parent element ID.

Briefly describe the generation of rrWeb implementation snapshots

If you only need to record and play back changes in the browser locally, you can simply save the current view by deep-copying the DOM. For example, do this with the following code (using jQuery to simplify the example and save only the body part) :

// record
const snapshot = $('body').clone();
// replay
$('body').replaceWith(snapshot);
Copy the code

serialization

We implemented the snapshot by keeping the DOM object whole in memory.

However, the object itself is not serializable, so we can’t save it to a specific text format (such as JSON) for transmission, which makes it impossible to record remotely, so we first need to implement a way to serialize the DOM and its view state. The reason we don’t use open source solutions like Parse5 here is two-fold.

  1. We need to implement a “nonstandard” serialization method, which is described in more detail below.
  2. This part of the code needs to run in the recorded page, as much code as possible, keep only the necessary functions.

Special handling in serialization

Our serialization method is non-standard because we need to do the following:

  1. De-scripting. All JavaScript in the recorded page should not be executed. For example, if we change the script tag to noscript tag during snapshot reconstruction, the contents of the script are no longer important. When recording, you can simply record a tag value without having to record all of the potentially massive script content.
  2. Record view state that is not reflected in HTML. For example,The entered value is not reflected in its HTML, but is recorded through the value attribute, which we need to read out during serialization and play back as an attribute.
  3. A relative path is converted to an absolute path. During playback, we will place the recorded page in a file, and the URL of the page will replay the address of the page. If there are some relative paths in the recorded page, an error will be generated. Therefore, we need to convert the relative paths during recording.
  4. Document the content of the CSS style sheet as much as possible. If the recorded page loads some homologous stylesheets, we can obtain the parsed CSS rules, and the recorded styles will be inline, so that some internal environment (such as localhost) recording can also have a good effect.

A unique identifier

At the same time, our serialization should also contain both full and incremental types. Full serialization can convert a DOM tree into a corresponding tree data structure.

For example the following DOM tree:

<html>
  <body>
    <header>
    </header>
  </body>
</html>
Copy the code

Will be serialized into a data structure like this:

{
  "type": "Document"."childNodes": [{"type": "Element"."tagName": "html"."attributes": {},
      "childNodes": [{"type": "Element"."tagName": "head"."attributes": {},
          "childNodes": []."id": 3
        },
        {
          "type": "Element"."tagName": "body"."attributes": {},
          "childNodes": [{"type": "Text"."textContent": "\n "."id": 5
            },
            {
              "type": "Element"."tagName": "header"."attributes": {},
              "childNodes": [{"type": "Text"."textContent": "\n "."id": 7}]."id": 6}]."id": 4}]."id": 2}]."id": 1
}
Copy the code

There are two things to note about this serialized result:

  1. The DOM tree is traversed in Node units, so in addition to the element type nodes of the scene, records of all nodes such as Text Node and Comment Node are also included.
  2. We added a unique id to each Node in preparation for incremental snapshots later.

Imagine if we recorded a button click on the same page and played it back. We could record that action (what we call an incremental snapshot) in the following format:

type clickSnapshot = {
  source: 'MouseInteraction';
  type: 'Click';
  node: HTMLButtonElement;
}
Copy the code

The operation can be performed again with snapshot.node.click().

However, in a real scenario, although we have reconstructed the entire DOM, there is no way to associate the DOM nodes being interacted with in the incremental snapshot with the existing DOM.

This is where the unique id comes in. We maintain the same id -> Node mapping over time on both the recording and playback sides, and update the same as DOM nodes are created and destroyed, ensuring that we only need to record the ID in the incremental snapshot to find the corresponding DOM Node during playback.

The corresponding data structure in the above example becomes:

type clickSnapshot = {
  source: 'MouseInteraction';
  type: 'Click';
  id: Number;
}
Copy the code

Increment snapshot

After completing a full snapshot, we need to observe all possible events that could change the view based on the current view state. In RRWeb, we have observed the following events (which will continue to increase) :

  • Changes in the DOM
    • A node is created or destroyed
    • Change of node attribute
    • Text change
  • The mouse moves
  • The mouse interaction
    • Mouse up, mouse down
    • Click, double click, Context Menu
    • The focus, the blur
    • Touch Start, Touch Move, touch End
  • Page or element scrolling
  • Window size change
  • The input

The playback

Rrweb is designed to minimize processing on the recording side and minimize impact on the recorded page, so some special processing is required on the playback side.

High precision timer

In playback when we will get the complete one-time snapshot chains, if all the snapshots, in turn, we can directly obtain synchronous execution is to record the last page state, but what we need is a synchronized initialization of the first full volume snapshots, again according to the correct time interval asynchronously, in turn, replay every increment snapshot, which requires a high precision timer.

The emphasis is on high precision because the native setTimeout does not guarantee accurate execution after the set delay time, such as when the main thread is blocked.

This kind of indefinite delay is unacceptable for our playback functionality and can lead to all sorts of weird things, so we implemented a constantly calibrated timer with requestAnimationFrame to ensure that incremental snapshots are played back no longer than one frame in most cases.

Custom timers are also the basis for fast forward.

Complete the missing node

In [Incremental Snapshot Design](## Incremental Snapshots), RRWeb’s delayed serialization strategy when using MutationObserver may result in the following scenarios where we cannot record a full incremental snapshot:

parent
  child2
  child1
Copy the code
  1. The parent node is inserted into child node child1
  2. The parent node inserts child child2 before child1

Child1 will be serialized by RRWeb first, but when serializing new nodes we need to record neighboring nodes in addition to the parent node to ensure that the new nodes are placed in the correct position for playback. However, child 1’s neighbor child2 already exists but has not been serialized, so we record it as ID: -1 (id null if no neighbor exists).

When we process the incremental snapshot of the newly added child1, we can know by its neighboring node ID -1 that the node to help locate it has not been generated, and then we temporarily put it into the “missing node pool” and do not insert it into the DOM tree.

Then, when processing the incremental snapshot of the newly added child2, we normally process and insert child2. After the replay, we check whether the id of the adjacent node of child2 points to a node to be added in the missing node pool. If so, we remove the node from the pool and insert it into the corresponding position.

To simulate the Hover

A CSS style for the hover selector exists in many front-end pages, but we cannot trigger the hover event via JavaScript. So we need to emulate the Hover event for the style to display correctly when playing back.

The specific method includes two parts:

  1. Iterate over the CSS stylesheet, for:hoverSelector related CSS rules add a rule that is exactly the same, but the selector is a special class, for example.:hover.
  2. Added for the event target and all of its ancestor nodes when the Mouse Up mouse interaction event is played back.:hoverClass name, which will be removed when mouse is down.

Start from any point in time

In addition to the basic playback capabilities, we want players like RRWeb-Player to provide similar functionality to video players, such as dragging and dropping to the progress bar to play at any point in time.

In the actual implementation, the snapshot chain is divided into two parts according to the given starting time point, namely, the part before and after the time point. After synchronous execution of the previous snapshot chain, the snapshot chain after the normal asynchronous execution can be played from any point in time.

sandbox

In serialization design we talked about “de-scripting”, that is, we should not execute the JavaScript in the recorded page during playback. Part of the problem was solved by rewriting all script tags to noScript tags during snapshot reconstruction. However, there are some scripted behaviors that are not included in script tags, such as inline script in HTML, form submission, and so on.

Scripting has a variety of behaviors. If you filter only known scenarios, it is inevitable that there will be omissions, and once a script is executed, it may cause irreversible unintended results. So we use HTML to provide iframe sandbox functionality for browser-level restrictions.

iframe sandbox

When we rebuild the snapshot, we rebuild the recorded DOM in an iframe element. By setting its sandbox property, we can disable the following behavior:

  • The form submission
  • window.openSuch as the pop-up window
  • JS script (containing inline Event handler and<URL>

This is consistent with our expectations, especially for JS script processing compared to self-implementation will be more secure and reliable.

Avoid link hops

The default event when a link is clicked is to jump to the URL corresponding to its href attribute. During the replay, we will rebuild the DOM of the page after the jump to ensure the correct visual replay, and the original jump should be prohibited.

Normally we would catch all a click events via the event broker and disable the default event with event.preventDefault(). But when we sandbox the playback page, all event handlers will not be executed and we will not be able to implement event brokering.

Looking back at our implementation of playing back incremental snapshots of interactive events, we can see that the Click event can actually not be played back. Because clicking with JS disabled doesn’t have a visual impact and doesn’t need to be perceived.

However, to optimize playback, we can add special animations to the simulated mouse elements at click time to remind the viewer that a click has taken place.

Iframe style Settings

Since we reconstructed the DOM in the iframe, we cannot influence the elements in the iframe through the CSS stylesheet of the parent page. However, the noscript tag will be displayed if the JS script is not allowed to execute, and we want to hide it, we need to dynamically add the style to the iframe. Example code is as follows:

const injectStyleRules: string[] = [
  'iframe { background: #f1f3f5 }'.'noscript { display: none ! important; } ',];const styleEl = document.createElement('style');
const { documentElement, head } = this.iframe.contentDocument! ; documentElement! .insertBefore(styleEl, head);for (let idx = 0; idx < injectStyleRules.length; idx++) {
  (styleEl.sheet! as CSSStyleSheet).insertRule(injectStyleRules[idx], idx);
}
Copy the code

Note that the inserted style element does not exist in the recorded page, so we cannot serialize it or the id -> Node mapping will fail.

The probe integrates RRWeb to implement the exception recording implementation scheme

Load the RRWeb video recording library asynchronously

For performance reasons, the probe wants to reduce its size. If the user does not enable abnormal video recording, the probe source file does not pack the code related to abnormal recording into the compressed probe file.

After the probe is initialized, the current project will be inserted into the RRWeb-Reocrd plug-in that is consistent with the probe version, based on the probe version. And start recording changes to the DOM on the page,

Acquisition strategy

Snapshot file cache

For performance reasons, the probe maintains two snapshot cache arrays internally. Divided into oldSnap and newSnap, the snapshot object generated by RRWeb will be placed in the newSnap array first.

Rrweb generates snapshot objects and pushes them into the newSnap array during initial initialization, page DOM changes, or user operations. Rrweb generates a full light snapshot once except during initialization, and then generates a full snapshot again for every 200 incremental snapshots (one full snapshot corresponds to two snapshot objects) {type=4 event}, {type=2 event}).

Similar to V8 garbage collection. Each time a full snapshot is generated, the probe will transfer the newSnap array to the oldSnap array, overwrite the data in the old oldSnap array, empty the newSnap array, and fill the new snapshot object into the newSnap array. Ensure that the snapshot data cached in the probe is within a certain range. The snapshot data does not occupy too much memory due to insufficient videos or too many snapshots.

Snapshot File Generation

When the probe detects an exception (for example, an exception occurs in E1), the probe reports fs1 and E1 full snapshot files and records the names of the full snapshot files

Full snapshot 1 File: fs1: [... full snapshot] ② E1: {host: 'file domain name ', full: '/{product_code}/{app_code}/ error_fullSnapshot_ {session_id}_{uuid}. XXX ', history: [], incre: [... Current replay incremental snapshot]}Copy the code

When the probe detects an exception (for example, an exception occurs during E2), it checks whether the full snapshot file corresponding to the current segment has been uploaded. If yes, only the snapshot data between E1 and E2 is reported and the file name of E1 is saved in history. In this way, the amount of data reported is reduced

③ E2: {full: fs1, history: [E1], incre: [{incremental snapshot between E1 and E2}]}Copy the code

If multiple exceptions occur within a time interval of 500ms, exceptions within 500ms will not be put into history, and the file in history will be read less during exception backtracking

④ E3: {full: fs1, history: [E1], incre: [{E1 ~ E3}]} ⑤ E4: {full: fs1, histroy: [E1], incre: [{E1 ~ E3}] [{delta snapshot between E1 and E4}]}Copy the code

To ensure sufficient video data, when the length of newSnap array is less than 100, the full snapshot file corresponds to the full snapshot file of oldSnap array

⑥ E5: {full: fs1, histroy: [E1, E4], incre: [{incremental snapshot between E4 and E5}]} ⑦ Full snapshot 2 file: fs2 ⑧ E6: {full: fs2, histroy: [], incre: ⑨ E7: {full: fs2, histroy: [E6], incre: [{E6 to E7 incremental snapshots}]} ⑨ E7: {full: fs2, histroy: [E6], incre: [{E6 to E7 incremental snapshots}]}Copy the code

The data reported

The snapshot file corresponding to abnormal playback is stored in the Ali-OSS repository. To prevent the loss of reported video data during permission acquisition, the probe maintains two report data queues, which store the full snapshot file FS and incremental snapshot file E$N respectively

The number of files stored in the two queues is limited. The files to be reported are put into the queue first and wait for consumption. In extreme cases, if the consumption of files is delayed, some files are discarded according to the first-in, first-out rule to prevent memory overflow.

The platform implements the abnormal playback function

After the user integrates the probe, an online fault occurs on the user. According to the replay_id in the reported log, the reported snapshot file can be obtained on ali-OSS, and the backtracking video can be played through the rrweb-player

To optimize a

After the project was launched, the overall experience was good, but some problems were still exposed for subsequent repair

  1. When the video is played back, the style file of the original page may be lost or changed along with the project iteration, leading to the style dislocation problem (this may require the back end to add a scheduled task, download the style file in the reported snapshot and store it in OSS, and change the introduction address of the corresponding style file to ensure the video traceability is complete).
  2. Video player experience optimization, linkage optimization of left event list and video backtracking.
  3. Enrich video recording data by matching requests (time consuming) and console data to video playback.

The resources

Rrweb: Open the black box for web page recording and playback

rrweb github