Author: Cui Jiawei (Qu Ji) [email protected]

This paper records some debugging methods accumulated by the author when reforming a permission system that manages millions of employees in the company, which can directly locate the desired location without reading through the code (:-)). Some of these methods may not be very common, but they all have one common benefit: once you’ve written this code for the first time, you can CTRL+C/CTRL+V for the same scenario on any project (it’s great if you want to copy the code in this article directly). Although the title only says “complex encapsulation”, this article also applies to code that is difficult to read and locate due to missing documents, improper variable naming, lack of retrievable character strings, improper design patterns, etc. It is especially useful for group passing code that does not behave properly

After reading this article, you will know at least three common ways to locate source code:

Approach 1. Use browser events with the call stack

  • The main content is in i. 1 Call stack to locate code that affects the page.

Method 2. Use function substitution

  • Substitution is described in i.2. Locate the code that produces or consumes a particular network call by substitution functions.
  • The alternative is to run your own code before the page code executes.

Method 3. Use CDP dynamic injection (in this paper, click on the page DOM automatically output the DOM in JSX/TSX source code is defined as an example)

  • The basic principle and operation effect in [a.4 use regular expression positioning generate any DOM element JSX source code]
  • The injection method is described in the comparison of three methods of efficient dynamic Modification of JS code by USING THE CDP.
  • 3. Use CDP to locate the characteristic behavior of JS language.4.
  • The content of injection is found in [iii.3 Understanding the calling methods and examples of these libraries].

Chapter 4 also introduces some methods to extend debugging capabilities, such as enabling Node debugging globally, enhancing JS scripting capabilities in Chrome, and some basic content.

First, basic positioning

1. Use the call stack to locate code that affects the page

I once came across a page that jumped off without any indication of features or documentation, and took a long time to locate key functions. To conclude later, it may be a good idea to use the call stack for positioning in this case.

The call stack records the context call relationship of the current execution environment; In js, you can use the console.trace function to obtain this call relationship. When executing the statement, the console.trace statement with different parameters will output different results depending on the execution context.

Using the example of my project, you can call console.trace in the onbeforeUnload event of a window:

// Here is the code to intercept the page jump
window.onbeforeunload = function () {
  console.trace('here jumps');
  debugger;
};

// Here is a page jump in simulated business code
setTimeout(() = >{window.location='http://www.bing.com'}, 1000);
Copy the code

The actual operation is as follows:

Notice that the core of the code above is the console.trace(‘here jumps’) sentence. This statement prints out the execution stack, as shown in figure A. Chrome fully recognizes that the page jump came from an anonymous function (manually entered at the Console) =>setTimeout statement => anonymous function (the first entry of setTimeout); It’s easier to test the code here, but when dealing with more complex code in a business scenario, the approach is exactly the same. You just need to find the source code from the SRC directory at the top of the call stack, and it usually finds the desired location. Similarly, after the debugger is triggered, you can manually switch to check the execution environment in part B of the screenshot (Call Stack, Call Stack) to know the source of the jump. The principle of both approaches is the same, the essence is to use browser native events and call stack for specific behavior interception;

Similar scenarios include controlling pages by switching hash instead of the entire pathname, which can be intercepted with the onHashchange event, and the same operation.

2. The replacement function locates the code that produces or consumes a particular network call

The above describes ways to locate source code using event specific behavior interception. There are also scenarios that do not trigger a particular event in the browser, or it is not easy to locate specific code through event interception. At this point, the method of substitution function can be considered for location interception.

Here, take network access as an example. There are two common network request methods: FETCH function and XHR object. In most cases, as long as these two native functions can be intercepted, network requests can be intercepted happily (in fact, there are special cases such as JSONP and long connection). Especially considering the fact that Chrome’s native debug panel is so weak in web traffic search, it would be nice to be able to intercept certain types of web traffic as they appear: -p

The code:

// Back up the original fetch function
window.____fetch = window.fetch;

// Replace the original fetch function to add your favorite functions
window.fetch = (. args) = >{
    console.trace("fetch log".JSON.stringify(args));
    return ____fetch.apply(null,args)
}

Copy the code

Note that only the request parameters are printed in the sample code; If you need to print the returned data, take over the fetch return process yourself; This step involves special handling for the fetch return type (Response, ReadableStream) and error handling that is fully compatible with the FETCH behavior. This article mainly focuses on different debugging methods. Those who want to know more about the standard behavior of fetch function can query more relevant articles ๐Ÿ™‚

In view of the approach is similar the XMLHttpRequest object, if only intercept request parameters, replace the XMLHttpRequest. Prototype. It is good to the open method. As you can see, the above code example also uses the console.trace method, which takes full advantage of the context information provided by the call stack to locate exactly where in the business code the call was made.

3. Run your own code before the page code executes

The book continues. In order to replace javascript functions, we need to make our code execute before the page’s original code executes. Otherwise, the page has already initialized the network call function with unmodified functions/objects, and it is too late to replace these native objects.

Depending on whether you are willing to write your own code, you can either use plug-ins or use debugging protocols (and their derived libraries).

Browse directly by using plug-ins, such as Chrome Plugin store to search for inject, Injector or Injection. At present, the plug-ins that provide this capability are not reliable enough to be used, and there are some problems that can’t perfectly ensure that the injected code runs ahead of the page script. Among them, the plug-in with relatively good effect is neocotic.com/injector, which can complete the task with a probability of 70%, which is not satisfactory. But it’s also good enough to quickly prototype your own product, and if that doesn’t work, refresh the page.

If you are willing to write a bit of code, I think the best tool is Microsoft’s Ourselves Debugging library, which optimizes the ability to inject scripts into pages. We can find that the library utilizes multiple CDP functions in the source code to ensure the stability and versatility of the code injection function. The use of specific methods for initialization, after good page instance, initialize the page document details you can refer to website: Playwright Library | Playwright) plus a word just a matter of:

// Initialize the page
const browser = await chromium.launch(); // Or 'firefox' or 'webkit'.\
const page = awaitbrowser.newPage(); \// Insert the code into it
await page.addInitScript({ path: './preload.js' });
Copy the code

Excellent reliability, and the whole process is clean and hygienic ๐Ÿ™‚

4. Use regular expressions to locate JSX source code to generate any DOM element

This section begins with a simple and simple way for a DOM element on a page to report its own position in the code when clicked, leaving room for improvement in the next two chapters.

OnClick ={()=>{console.trace(“imhere”)}}} :

find ./src -type f -iname "*.js*" |xargs sed -i "" 's/\(<[^ />?:+;,=]\{1,\}\)\( \{1,\}[^,:;|&)}\d]\)/\1 onClick={()=>{console.trace("imhere")}} \2/g'
Copy the code

This command uses regular expressions to find code in a JSX file that describes a DOM node and injects the onClick={()=>{console.trace(“imhere”)}} statement into its DOM scope. Using the features of the console.trace function described in the previous section, you can use console output to locate the source code for rendering the DOM component by clicking on the PAGE DOM.

After confirming the mapping between page DOM and JSX file source code, use Git’s uncommitted file restoration function, or run the following command in the terminal:

find ./src -type f -iname "*.js*" |xargs sed -i "" 's/ onClick={()=>{console.trace("imhere")}} //g'
Copy the code

This command is used to undo the changes made to the file by the previous command.

Of course, this approach is intrusive and not elegant; It’s also unreliable to use regular expressions to modify files, but this section also provides an idea. If we use together: A very reliable and non-invasive workflow can be used to establish the corresponding relationship between DOM results rendered on the page and JSX/TSX source code (of course, it can also be used to do other things, in any case, AST is used. Write whatever you want).

When you click on the DOM, you’ll see the following effect:

Click to directly output the child component to the parent component associated JSX/TSX code file, the location in the file and the specific source content. The implementation is described in detail from the next section ๐Ÿ™‚

5. Special use of conditional breakpoints

Here is an approach to code injection using conditional breakpoints. Conditional endpoints can be set in the Chrome Developer Tools source panel, as shown below:

A common use of conditional breakpoints is to return a value at which Chrome will break if the return value can be converted to true, otherwise it will not break.

Conditional breakpoints can also change the original running state of JS. Here is a simple example:

var a = 1;
console.log(a); // Set the conditional breakpoint here, and enter a = 2 in the condition; False will see the output changed to 2. If you turn off the conditional breakpoint, you will see the output changed back to 1
Copy the code

With this feature in mind, and the conditional breakpoint function in the CDP mentioned below, you can change the execution flow of the relevant JS code ๐Ÿ™‚

Second, the CDP

1. What is CDP

CDP is the Chrome DevTools Protocol, a debugging Protocol provided by Google. Through CDP, external tools can make full use of the debugging capabilities of various Chrome browsers. In website chromedevtools. Making. IO/devtools – pr… There are detailed introduction and recommended use of libraries.

Note that the functions you can use vary depending on the scenario you are debugging; If you’re debugging Chrome page scripts, you can use almost all of them (see Latest on the left). If you are debugging NodeJS scripts, you can only use the V8-Inspector (Node) series.

The following chapters assume that you already have a good knowledge of CDP. For basic information about the CDP, see Section 4, 5, and 6 of the Appendix.

2. Comparison of three methods of efficient dynamic modification of JS code using CDP

For already written JS code, at least three stages can be relatively efficient dynamic modification; They are: network connection stage, code parsing stage and code execution stage;

Dynamically modify the code during the network connection phase

This method mainly depends on the Network in CDP. SetRequestInterception function. This function works when Chrome initiates a network connection or gets a network response. You can make some “revisions” to the javascript file after the page requests it, and then pass it to Chrome before Chrome’s V8 engine gets its hands on the content.

This approach is most efficient because Chrome gets the content of the JS file we need the first time it parses, and all additions are executed as raw code. However, patch also has high cost, mainly in the case of code executed using EVAL (packaged with NPM), which needs to parse eval string parameters separately. The same goes for functions created with the function method. Another problem is that if the original JS file contains sourcemap, our revised JS file will no longer be aligned with the original sourcemap file, causing Chrome to read and parse the sourcemap incorrectly. To solve this problem, we need to first manually parse sourcemap, record the corresponding relationship between sourcemap and JS statements, and then modify the JS file and synchronize the sourcemap file, so that it and JS statements remain aligned.

Code replacement occurs during the code parsing phase

This method relies mainly on debugger.scriptparsed events in CDP for code replacement. This event lets us be notified when the code has finished parsing. According to different scenarios, of course, it is possible that using a Debugger. SetInstrumentationBreakpoint function first to intercept, etc. This function breaks when a new script (not necessarily a new file) is loaded, and can be intercepted in response to the debugger.paused event. Then use debugger.setScriptSource to modify the script code.

Since the code we’re going to execute is native code, this approach is also efficient. Note that the timing of the change is not arbitrary, and in some cases this function will return success when it did not, as described in the appendix of this article.

At the same time, this approach solves the problem of manually parsing the eval and function functions in the previous approach, but it still has the problem of sourcemap alignment, so the extra cost is high.

Dynamically modify code during code execution

The debugger.setbreakpoint function of CDP is mainly used with its conditional breakpoint parameter. This function can set conditional breakpoints on the code. As mentioned in the previous chapter, conditional breakpoints can change the program execution result, so we can make the patch conditions into conditional breakpoints and execute them together with the JS script execution.

This is the cheapest way to write and resolves both the manual parsing of string parameters in eval and function and the sourcemap alignment problem (because the extra code we inject is inside the conditional breakpoint and does not break the alignment between the original code and sourcemap). And we can turn on and off our code injection flexibly by enabling and disabling breakpoints, which is very suitable for prototyping and rapid experimentation.

Of course, this method also has corresponding disadvantages, that is, the execution of conditional breakpoints is not completely at the same level as the original code, there will be communication and operating environment switching costs. However, it is generally enough for the establishment of the product prototype. After confirming that the code to be injected can play a role, it is recommended to rewrite the process of the network phase and parsing phase above to improve the actual operation efficiency.

Other possibilities

In addition to the above three methods, there are other ways to dynamically modify functions, such as using the CDP under the debugger. The debugger namespace setVariableValue function, can modify the JS code alone a function return values, but this kind of CDP function called frequently substantially reduce code execution efficiency, Larger scenarios don’t work; So setVariableValue and other methods like it are not going to be covered here.

3. Use CDP to locate the characteristic behavior of JS language

In Chapter 1, you’ve seen how to use browser events and functions to replace the source code for location features. The method in this section can be used for anchoring in cases where the two methods cannot be located.

Basis and heavy method, such as from the first to positioning the behavior of a characteristic value, and then in view of the main/every assignment statements automatically into judgment statements, if found in the operation of a variable value is equal to the/contains the characteristic values of will make a record, finally get a the characteristic value from the “birth” to “death”, The flow at each stage of the process can be as accurate as which file and which function are being moved to which line.

Interested partners can try to use the ABOVE CDP method to find the core search function of a search engine. The specific operation method is as follows:

  1. Let the browser run in debug mode
  2. Arbitrarily draw up a search eigenvalue, e.g123
  3. Choose your favorite method using CDP injection code, for each assignment statement to determine whether the assigned value is included123If yes, make a note of it, or just take it off
  4. Enter the eigenvalues proposed in step 2 in the search box, as follows123
  5. See search business related functions based on breakpoint location or record information

This whole process is very close to the way you debug your business code. The only thing you notice to find this way in a business scenario is the compiled code, which needs to work with Sourcemap to correlate the source code.

Of course, there are more elegant methods, such as dynamic code injection during NPM compilation. Remember from Chapter 1 that you used regular expressions to modify the source code, and then used any DOM to find the source of the original JSX file that generated it. The method described in that section has many limitations, so here is a more optimized version:

  1. Have the Node execution environment run in debug mode
  2. Choose the method you like to use CDP to carry out code injection for webPack’s function to read business code. After reading the file and before the following steps, call AST parsing library to carry out your favorite operations on the source code, such as injecting some additional attributes to all DOM
  3. Start your build, enjoy it ๐Ÿ™‚

In the above steps, there are two cores: locating the key functions to webPack to read the business code, and using the AST to make changes. The operations to locate the key webpack functions are covered in the next section, and the AST operations are covered in Chapter 3.

4. Use CDP to lock one webpack function instance

Following up, it’s often harder to locate a function in a Node execution environment than it is in a browser, because the amount of code in a Node execution environment with a large dependency library is easily one or two orders of magnitude higher than in a browser environment. More efficient methods are needed (such as the “more basic and cumbersome methods” described in the previous section are not recommended if there are no other methods).

For example, if the function you’re looking for here is known to have file-manipulation capabilities, the main file-manipulation library in the Node environment is FS (there are several other variants, but I’ll start with the main ones). We can do code injection for all synchronous and asynchronous file opening functions in fs library, so that the number of injection points can be controlled within 10. Just find the FS library calls that read the business-critical code (source files in the project SRC directory) and look at the call stack to find the key functions in webPack. The function identified here is the processResource function in the LoaderRunner. Js file.

By dynamically injecting code into the processResource function, along with the AST operations described in the next chapter, we can make a lot of interesting changes to the business code after it is read by WebPack and before it is actually built.

The React component may not respond to the onClick attribute if it is not native to the React component. In this case, modify the render function to print debugging information to locate the source code before the function returns. The operation is the same. At the same time, if you have made up my mind to modify the business code of the render function, also can do some of the more interesting action, such as generating the invocation chain between different components, including: whoever calls, call passed which parameters, component what its own internal public variables (regardless of class to form a state function, a hook). This lends redux’s backtracking capabilities to various history projects, which are important for debugging old code.

The same method can also be applied to VUE system components, the operation method is exactly the same.

Third, the AST

1. What is AST

AST (Abstract Syntax tree) is a standardized and structured syntax representation method. Using the existing AST generation and editing libraries, we can make various checks and modifications to the code. In fact, the main JS syntax checking and compilation libraries currently rely on mature AST capabilities, including the TSLint, ESLint, Babel, and typescript families.

2. Quickly compare the ability of several AST parsing libraries

The AST Explorer is recommended. It comes with a user-friendly display of AST parsing and editing results, and is integrated with almost all AST parsing and editing libraries available today. Best of all, it is cross-language and can be used to evaluate AST libraries not only for JS languages, but also for PHP, C, JAVA and other languages.

For the typescript language (TSX) mixed with DOM manipulation, the libraries that currently have the most complete parsing capabilities are probably the official typescript library and the third-party typescript-ESLint library.

The goal of the typescript-ESLint library is to make ESLint syntax hints perfectly compatible with the typescript language specification. For this project, see the official TypeScript ESLint website (typescript-eslint.io). Examples of invocation will also be highlighted below using this library as an example.

3. Learn how to call these libraries and examples

For many libraries with AST parsing capabilities, there is no separate documentation on how to invoke this capability; It often takes a lot of time to explore on your own. At this point, you can refer to the official source code for the ASTExplorer project. With good directory design, we can quickly find the method to call the AST library we want to use.

Simply in astexplorer/website/SRC/parsers/js at master, fkling/astexplorer, making directory search.

Take the typescript-ESLint library for example. If I want to invoke the AST parsing capabilities it provides, but the instructions are not readily available in the official documentation, I can find the file named after the parser library in the source directory:

And find the core of it:

/ / excerpts from https://github.com/fkling/astexplorer/blob/master/website/src/parsers/js/typescript-eslint-parser.js

  loadParser(callback) {
    require(['@typescript-eslint/parser'], callback);
  },

  parse(parser, code, options) {
    return parser.parse(code, options);
  },

  getDefaultOptions() {
    return {
      range: true.loc: false.tokens: false.comment: false.useJSXTextNode: false.ecmaVersion: 6.sourceType: 'module'.ecmaFeatures: {
        jsx: true,}}; },Copy the code

With some modifications to the above code, we can write our own code that uses the library for AST parsing:

var parser = require('@typescript-eslint/parser/') // This points to the actual installation location of the typescript-ESLint library
var code = 'console.log("123")' // Here is the code we need to parse
var option = {
    range: true.// Note that without this option an error is reported indicating that the range 0 is invalid
    loc: true.tokens: true.comment: true.ecmaVersion: 'latest'.sourceType: 'module'.ecmaFeatures: {
      jsx: true,}}var result = parser.parse(code, option) // This sentence will be parsed after it is run
Copy the code

Then on the results of the analysis, let yourself go for all kinds of modifications ~

4. Appendix

1. Node debugging is enabled by default in a new process

There are many debugging methods using CDP described above, but in practice, some libraries will start a new process with Spawn. Sync and the debugging status will be lost. At present, there are three ways to deal with it:

1. Use CDP dynamic Patch Spawn. Sync method; This operation is the same as the operation flow of CDP Patch FS library, and will not be repeated here.

2. Create a Node wrapper and place it in the node directory to replace the original Node program. When an external program calls Node from the default path, it actually calls wrapper, which then passes debugging options to the original Node program. This approach needs to deal with the context alignment of the parent and child processes and ensure consistency of performance under the SheBang execution environment; In practice, it is necessary to know more about system behavior, which will not be discussed here.

3. Recompile the Node program. By default, debug options are enabled and a random debug port is generated. This method is more simple than 2, the specific operation method for modifying the node source (download address: download | node. Js Chinese website (nodejs. Cn) when writing this article, the node version 16.13) under the SRC directory env. Cc file Environment class constructor:

// Modify this function
Environment::Environment(IsolateData* isolate_data,
                         Isolate* isolate,
                         const std::vector<std::string>& args,
                         const std::vector<std::string>& exec_args,
                         const EnvSerializeInfo* env_info,
                         EnvironmentFlags::Flags flags,
                         ThreadId thread_id)

// Omit some code in between, up to the following line
  options_ = std::make_shared<EnvironmentOptions>(
      *isolate_data->options()->per_env);

// Manually add the following content
  if((getenv("NODEINSPECTEVERYWHERE"))) {
    // The debugging state is enabled by default
    options_->debug_options().inspector_enabled = true;
    // Randomly set a port
    struct timeval tvStart;
    gettimeofday(&tvStart, NULL); // Call the high-precision time function to generate random number seeds
    srand((int)tvStart.tv_usec);
    int port = rand() % 900 + 100 + 9000; // Use random numbers to specify debugging ports to avoid port conflicts
    options_->debug_options().host_port.set_port(port);
  }
  // The above content is manually added
  
  inspector_host_port_ = std::make_shared<ExclusiveAccess<HostPort>>(
      options_->debug_options().host_port);

Copy the code

Getenv (“NODEINSPECTEVERYWHERE”) is used to obtain the value of the environment variable named NODEINSPECTEVERYWHERE. It is used as a switch, that is, if the environment variable is set, global debugging will be enabled, and if the global variable is not set, it will not be automatically enabled. Specific use can be replaced by other conditions, such as the flag bit to write in the file, or combined with NVM use, cancel the judgment conditions here;

Options_ ->debug_options().inspector_enabled = true; To enable the debugging state; Int port = rand() % 900 + 100 + 9000; rand() % 900 + 100 + 9000; Statement to set a random debug port; Here we should pay attention to the network article reproduced more random number generation source is generally using the time function, but to the author’s MAC environment for example, this method is not accurate, if a second run for many times, will get repeated “random number”, so here use high precision time function, namely getTimeofday.

/configure and make, see node/ builder. md at master ยท nodejs/node ยท GitHub for more compilation instructions.

2. Brief analysis of the reasons why setScriptSource function in CDP cannot really succeed

The setScriptSource function has a behavior that is not described in the documentation (I have read) : the function is called and the return value shows success, but it does not.

Traces of this behavior can be found in the source code for two files involved in the Chromium project:

/ / chromium \ SRC \ debug \ \ SRC \ v8 liveedit. Cc file

bool CanPatchScript(const LiteralMap& changed, Handle<Script> script,
                    Handle<Script> new_script,
                    FunctionDataMap& function_data_map,
                    debug::LiveEditResult* result) {
  for (const auto& mapping : changed) {
    FunctionData* data = nullptr;
    function_data_map.Lookup(script, mapping.first, &data);
    FunctionData* new_data = nullptr;
    function_data_map.Lookup(new_script, mapping.second, &new_data);
    Handle<SharedFunctionInfo> sfi;
    if(! data->shared.ToHandle(&sfi)) {
      continue;
    } else if (data->stack_position == FunctionData::ON_STACK) {
      result->status = debug::LiveEditResult::BLOCKED_BY_ACTIVE_FUNCTION; // It is the BLOCKED_BY_ACTIVE_FUNCTION state here that causes the above behavior;
      return false;
    } else if(! data->running_generators.empty()) {
      result->status = debug::LiveEditResult::BLOCKED_BY_RUNNING_GENERATOR;
      return false; }}return true;
}
Copy the code
/ / chromium \ SRC \ runtime \ \ SRC \ v8 runtime - debug. Cc file

RUNTIME_FUNCTION(Runtime_LiveEditPatchScript) {
  HandleScope scope(isolate);
  DCHECK_EQ(2, args.length());
  CONVERT_ARG_HANDLE_CHECKED(JSFunction, script_function, 0);
  CONVERT_ARG_HANDLE_CHECKED(String, new_source, 1);

  Handle<Script> script(Script::cast(script_function->shared().script()), isolate);
  v8::debug::LiveEditResult result;
  LiveEdit::PatchScript(isolate, script, new_source, false, &result);
  switch (result.status) {
    case v8::debug::LiveEditResult::COMPILE_ERROR:
      return isolate->Throw(*isolate->factory() - >NewStringFromAsciiChecked(
          "LiveEdit failed: COMPILE_ERROR"));
    case v8::debug::LiveEditResult::BLOCKED_BY_RUNNING_GENERATOR:
      return isolate->Throw(*isolate->factory() - >NewStringFromAsciiChecked(
          "LiveEdit failed: BLOCKED_BY_RUNNING_GENERATOR"));
    case v8::debug::LiveEditResult::BLOCKED_BY_ACTIVE_FUNCTION: // Check the return result of the above function
      return isolate->Throw(*isolate->factory() - >NewStringFromAsciiChecked(
          "LiveEdit failed: BLOCKED_BY_ACTIVE_FUNCTION"));
    case v8::debug::LiveEditResult::OK:
      return ReadOnlyRoots(isolate).undefined_value(a); }return ReadOnlyRoots(isolate).undefined_value(a); }Copy the code

It is this BLOCKED_BY_ACTIVE_FUNCTION state that causes CDP to call setScriptSource to both return success and not actually succeed. SetScriptSource is the source code that Chrome first compiles, and then checks to see if it generates this state. So chrome’s internal return of BLOCKED_BY_ACTIVE_FUNCTION status has generated code caching and done some pre-computation.

The focus of this article is not on the Details of the Chromium implementation, but if you are interested in the specific behavior of this function, please trace the above two functions in debug mode, you can see more interesting details ๐Ÿ™‚

In addition, see Chromium /get_the_code.md at Master ยท Chromium/Chromium ยท GitHub for more information about source code acquisition and compilation of the Chromium project

3. Enhanced JAVASCRIPT scripting in Chrome (persistence and call stack awareness)

This section discusses how to extend Chrome’s native features. In debugging, sometimes we want to write a JS variable directly to the disk, so that it is very convenient to share with other programs or save as a record. However, javascript native in the browser does not support this operation, we can find a way to add this ability. For example, we want the console.log function to record the string directly to the file when the first argument starts with “write “.

Direct modification based on source code

Since Chrome is open source software, the most straightforward idea is to directly change the source code, so that the operation efficiency will be relatively guaranteed; This article is not focused on this approach, so go straight to the code:

/ / modify main_thread_debugger in Chromium project. Cc file of void MainThreadDebugger: : consoleAPIMessage function, this is a MAC version, pay attention to Win version need to replace the path search way, write file rules are also different
// Remember to add fstream, sstream, string headers

Here is only one example. In fact, there are at least three places where similar changes can be made to increase disk write capacity

/ / in the frame - > the Console () ReportMessageToClient (... Add the following before
char targetpath[512];
std::string homedir(getenv("HOME")); // Write file path, can also be made into other rules
String msg_str = ToCoreString(message);
String filetag = "kerome.txt"; // The name of the file to record
if (msg_str.Substring(0.6) = ="write ") { // Determine the conditions for writing, and replace them with what you like
  filetag = msg_str.Substring(2.14); // File naming rules, also change your favorite can be
}

// Convert the path
realpath((homedir+"/Desktop/"+filetag.Utf8() +".txt").c_str(), targetpath); 

std::ofstream fout(targetpath, std::ios::app);
// You can change the format of the file to your favorite version;
fout << line_number << "," << column_number << "," << ToCoreString(url) << ","
<< ToCoreString(message).Utf8() < <"," << location->ToString()<<std::endl;
fout.close(a);Copy the code

Modify the source code, rebuild it, and run the compiled Chromium with the no-sandbox parameter to see the results. Kudos to Chromium for its sandbox capabilities, it does have a way of preventing code injection.

CDP based enhancement

This article has spent a lot of time introducing the various wonderful uses of CDP, so if you want to enhance JS scripting capabilities, CDP will definitely be included. In fact, CDP operation efficiency is not as good as direct modification of the source code, but the development efficiency is much higher, who do not want to do a product prototype when a small modification will be compiled for 30 minutes +, can immediately see the results must be excellent; Or go directly to the code:

_writeFile('yourContent');
// To highlight the point, only the core code is written here

const puppeteer = require('puppeteer-core');
const fs = require('fs')
const webSocketDebuggerUrl = "ws://localhost:9222/devtools/browser/93972e12-e794-492c-a240-fb872fb936b8" / / access: http://localhost:9222/json/version
let browser, page, client

puppeteer.connect({
  browserWSEndpoint: webSocketDebuggerUrl,
  defaultViewport: null.// args: ['--no-sandbox', '--disable-setuid-sandbox'],
})
  .then(v= > { browser = v; return browser.newPage() })
  .then(v= > {
    page = v;
    page.exposeFunction('_writeFile'.(v) = > {
        fs.writeFile('/YourPath/test.txt', v, () = >{});
        return ' ';
    })
    return page.target().createCDPSession()
  })
Copy the code

Note that this approach can be used to do a lot of things. For example, the call stack given by console.trace cannot be retrieved by the JS code itself. Can we use this approach to make the JS code retrieve its own call stack? Absolutely fine (the call stack is retrieved in response to the Runtime.consoleAPICalled event).

4. Recommend some common functions of the CDP

The following is a ramble to introduce some useful CDP functions that the author thinks are very useful. If the sample code provided in this article does not meet the scenarios required by readers, the following functions can be combined to obtain the desired results.

Debugger namespace

Debugger. EvaluateOnCallFrame used to perform within a specified stack JS source code, more useful than the similar function under the Runtime namespace, mainly reflected in the context of the can specify execution, want access to all kinds of local variables. Executing functions in the Runtime namespace often have no internal variables (strictly speaking, there is a runtime. callFunctionOn function that provides similar functionality, but is more complex to call). The main functions of this version in the Debugger space are almost exactly the same as the implementation in the developer tool interface.

The Debugger. SetInstrumentationBreakpoint tool used to set the breakpoint, but what is a tool the breakpoint? As can be seen from the call parameters of this function, it basically refers to the moment before the new script is executed for the first time. Note that the new script is not just external script loading, but also includes eval, function, and so on.

The Debugger. GetPossibleBreakpoints can be viewed as a simplified version of the AST, provide limited categories breakpoints list of the available position. Note that if THE JS language is used for CDP control, it is best not to cast a wide net in this list. Generally speaking, the operation of self-indulgence will pay the performance price. Note also that breakpoints are not limited to the locations provided here. For example, breakpoints between two locations provided here may be possible depending on the code; However, this function provides reliable positions. By the way, I am looking forward to Chrome CDP publishing an official syntax tree function, otherwise the V8 engine has to parse the syntax tree separately with a third party library, which feels wasteful.

Debugger.setbreakpoint Note that the conditional breakpoint parameter is very useful. The previous section was devoted to the “special use of conditional breakpoints”.

Page. GetCookies Many pages now use cookies for authentication. This function can obtain cookie information across URLS, which is very useful for login state transplantation of official pages and test pages.

Page namespace

Page. AddScriptToEvaluateOnNewDocument necessary, debugging can be injected in the Page code; Notice this library implementation, a Playwright did a much more elegant than the Puppeteer (the former code can be made a lot of work for the “an injection, run anywhere”, the latter really call Page. AddScriptToEvaluateOnNewDocument function). See the page. Add_init_script implementation in the ourselves source code for details.

In another example of third-party libraries, ourselves and Puppeteer both provide a page. On method, in which request events can be specified, which I find operationally the easiest and most straightforward way to replace page traffic.

Runtime namespace

Runtime. RunIfWaitingForDebugger waiting for a debugger execution environment, need to call this function; Otherwise, it doesn’t matter how many times you call resume, continue, etc., it doesn’t matter. It’s a little bit of a pit.

Runtime.consoleAPICalled is an event that is triggered when JS code calls console functions such as console.log. This event will be mentioned again later when we enhance the persistence capabilities of the JS language; Similarly Page. HandleJavaScriptDialog events;

Runtime.addBinding a function that injects various functions into the JS execution environment. Very useful function, with good can do a lot of original JS syntax can not do things. This function is highlighted in the section “Making JS scripts more reflective in Chrome” below.

5. Several recommended CDP operation libraries

The official libraries, which may not be the most grammatically elegant, are certainly the most comprehensive and less likely to be down the drain.

There are two libraries that I think are very well implemented, if any of the cuties are willing to explore unofficial ones:

One is the PyChrome library of Python language, both function call syntax and event response syntax are very elegant, function call directly write this function with named parameters, response to the event directly to respond to the event assignment, writing process is very intuitive; The only pity is that I can’t seem to operate on browser objects?

The other library is the Chrome-remote-Interface library for JAVASCRIPT, which is similar to pyChrome’s syntax (the main difference is that the syntax for responding to events takes the response function as an argument). It’s kind of the same thing.

There is also a rare case where the call is made from within Chrome Extensions and you just need to add the Debugger when registering permissions. This way, the call syntax is also more elegant, the only thing is that the callback function is not the mainstream asynchronous way to implement; Interested partners can refer to the official example github.com/GoogleChrom… Try it out for yourself.

6. Start JS debugging in Win and MAC

To use the CDP function, you need to enter the debugging mode.

Execute against Node environment

Starting the Node or browser execution environment depends primarily on the –inspect or –inspect-brk arguments, the difference being that the latter will cause the execution environment to break at the first line of the script, while the former does not. These two parameters can also be appended, such as –inspect=9000 to enable listening on the native 9000 port.

Similarly, to run the test.js script in the current directory with listening enabled on the default port, execute Node./test.js –inspect. If you want to listen for NPM statements, you need to find the NPM file directory corresponding to the current project, add debugging parameters and run it with Node.

Of course, you can also use VSCODE’s debugging function, as shown in the following figure. Open package.json, find the executable code segment, and there will be a debugging prompt.

There is a little bit of a catch here. It is possible that the library script will call spwan.sync internally to start a new process and let the code we want to debug be executed in the new process. But the debug state specified by the –inspect parameter is not normally passed to the newly opened process. This causes the code we really want to debug to go out of the debug state.

For a solution to this situation, and how to implement it if we want to automatically enable debugging for all NPM packages instead of manually adding debugging parameters, see the section “Enable Node debugging by Default in new processes”.

For the browser execution environment

To enable the debugging function in the browser environment, you only need to add remote-debugging-port

You can create shortcuts and add these parameters in the Win environment. For example, if the installation path of Chrome is D:\Application\ Chrome. Exe remote-debugging-port=5003 to enable debugging listening on port 5003.

On the Mac, you can do this automatically, Such as open/Users/YourAccountName/Desktop/codes/chrome/chromium/SRC/out/stable/chromium. The app – the args –remote-debugging-port=9222 Enables debugging on port 9222.

Connecting the Debugging environment

Enter Chrome ://inspect/ and click the Open Dedicated DevTools for Node hyperlink to Open the global debug window. Click “Add Connection” and enter the listening port number to connect to the debug environment.

conclusion

This is a summary of some of the non-invasive methods I have used to directly locate program source code based on characteristic behavior and presentation (DOM components are used in this article) while maintaining an “ancestral” project. The concrete implementation involves the basic debugging method, the utilization of CDP and the utilization of AST, and also introduces some methods of rebuilding programs to solve problems when existing programs can not be solved.

If you are reading this little lovely work and the above coincidence, then I hope the above principle and method can help you; If there is little overlap between the work done by readers and the above content, it is also hoped that the theories and methods in this paper can open readers’ thoughts to solve practical problems ๐Ÿ™‚

I wish every little partner every day a happy ^_^