Make writing a habit together! This is the second day of my participation in the “Gold Digging Day New Plan ยท April More text challenge”. Click here for more details.
๐ข ๐ข ๐ข ๐ข ๐ข ๐ข hello! Hello everyone, I am [Dream Eraser], 10 years of research experience, dedicated to the spread of Python related technology stack ๐
๐ Last updated: March 30, 2022, Eraser’s 605th original blog post
โณ๏ธ Creative background
This article will continue to cover Python data collection, and this blog will only focus on the developer tools that come with a browser.
We’re going to lay out the browser developer tools thing for you in the longest and clearest way possible.
Developer tools are well used and data acquisition code is generally well written.
The Google Browser version used this time is 100+ version (seems to have just been released), you can upgrade to be consistent with me
Press F12, Ctrl+Shif+I on the keyboard to wake up the developer tool. You can also use the mouse to click on the three origins in the upper right corner, or right-click on the blank area of the web and select Check to successfully pull up the developer tool.
The default is as follows:
โณ๏ธ Basic configuration
In order to make the developer tool more suitable for use by us data collectors, we can make some global changes. I will show the specific Settings in detail.
Click on the small gear in the upper right corner, or use its shortcut key to press F1 (click on developer tools first, otherwise you will enter the browser help manual).
In the first column of preferences, select Language. If your developer tool already uses Chinese view by default, skip this step.
By default, Show What’s…. is displayed under the language Settings .
Next, if you look at the preferences, you’ll see that they correspond to common tabs used by developer tools, as shown below:
For example, when we turn on the ruler in element selection, and then use the developer tool to select the DOM node, the specific coordinate position will appear, which is very useful for the front end.
For data collection, caching is normally turned off so that all requests are retrieved each time, so in the network configuration, the following option is enabled.
You’ll also find an interesting feature called force-blocking ads on this site, which when turned on does seem to block ads, but doesn’t seem to block anything.
Sure enough, it killed its own people by blocking all of Google’s own ads, but the downside is that it can’t be configured globally and can only block sites that currently have developer tools enabled.
Other configurations Generally, we do not need to perform special operations. Continue to look down to find other configurable contents:
Workspaces, experiments, and ignore lists are not often used, so keep the defaults.
The device can add adjustable mobile phone devices, generally front-end engineers do compatibility use, but also can customize the resolution, very convenient.
Throttling configuration allows you to preset some speed limiting schemes in advance, which is also a quick operation.
After adding the throttling scheme, you can configure the speed limit on the network card, such as the following AAA scheme which is set in advance.
There is an interesting configuration here, is the built-in shortcut key preset, unexpectedly can choose VsCode, really did not expect, generally do not need to change, keep the default, you can also come to see some quick operations.
For example, common shortcut keys for eraser are as follows
Ctrl+L
: Empty the console;F9
: single step debugging;F10
: Skip the next function call;F11
: goes to the next function call;
โณ๏ธ three points on the right
Let’s continue to explain the functionality of the developer tools around the three dots on the right side of the extension menu.
Docking side this is very easy to understand, is the developer tools display location, divided into independent, left, bottom, right, according to the actual situation layout can be.
Display console drawer bar When clicked, the console is placed separately under developer tools for easy use
The search search function can retrieve the target content in all static resources of the current page and has a high attendance rate when writing crawlers. You can also use CTR + Shift + F to call the corresponding window.
Run command this content is basically the same as the shortcut command, can wake up many extension tools.
The benefit of this feature is that when you can’t remember the full name of the feature, or don’t know the specific location of the feature, you can do a fuzzy query.
More tools
There are a lot of tools that you should try, such as media, which can be very valuable when you are collecting audio and video, such as the one shown below.
After turning on layers, you can see the rendering hierarchy of the page directly, which is very valuable for front-end engineers.
After opening the website, you can also monitor the page performance, such as the following window, page optimization artifact.
Shortcut keys and help these two parts are not detailed analysis, belong to the basic function module, click to check, but the help part may not be easy to check.
โณ๏ธ Element Web page elements
In addition to the basic Settings above, we will start with the Element element.
The Element is the default TAB of the developer tools, and the default window is shown below.
The main content area of the interface is the DOM element, which is the HTML structure of the page. When the mouse is slid over the code, the page also has the corresponding selected status mark.
With the Select tool in the upper left corner of the developer tool, you can select elements directly from the web page.
You can quickly search related tags. Press Ctrl+F on the keyboard to wake up the search box, which is at the bottom of the page.
The search criteria can be searched based on strings, selectors and XPath. The following two pictures are the filters carried out by selectors and XPath respectively.
After the element is selected, a functional area for the element will appear on the right, as follows:
The style area starts with the style area, which contains the list of features shown below
filter
: Filter criteria that can be retrieved in the current page style;:hov
: You can force element state, as shown in the figure above:active
.:hover
When Python crawlers are written, you sometimes need to set to:hover
State, you can check the undisplayed data;.cls
: indicates that an existing class can be added.A plus sign
: adds a custom class.Calculate the style
: This button is consistent with the content of the second card, indicating the calculation style.
Calculation style area here is displayed CSS after the calculation of the style results, the value of the front end is greater, the general acquisition program will not be used.
Event listener This feature is very common when writing crawlers. It can quickly find the time when certain buttons are bound, so as to quickly locate to the specified JS. In addition, some sites write the data request logic in the binding event, so it is easy to directly find the corresponding API interface.
DOM breakpoints this content also appears in crawlers, and sometimes we grab interfaces based on changes in page element attributes or state. DOM breakpoints need to be established on the element before they are shown. The establishment method is as shown in the figure below. Right-click on the target node and select “Occurrence of interruption Condition”. The sub-options are divided into three operations: subtree modification, attribute modification and node removal.
Properties and accessibility properties are commonly used, so you can focus on them.
Right-click on an element to get extensions Select any element and then right-click to operate on the selected element. For example, you can copy an identical element.
We can edit web elements directly, for example, we get button elements
<input type="submit" id="su" value="Google it." class="bg s_btn" />
Copy the code
You can get a lot of content from this element.
Copy the element
<input type="submit" id="su" value="Google it." class="bg s_btn" />
Copy the code
Copy the outerHTML
<input type="submit" id="su" value="Google it." class="bg s_btn" />
Copy the code
Copy the selector, the selector
#su
Copy the code
Copy the JS path
document.querySelector("#su")
Copy the code
Copying the style
list-style: none; margin: 0; font-family: Arial,sans-serif; -webkit-appearance: none; ...Copy the code
Copy the XPath
//*[@id="su"]
Copy the code
Copy the full XPath
/html/body/div[1]/div[1]/div[5]/div/div/form/span[2]/input[2]
Copy the code
All of this is used every day when writing a crawler
The rest of the content can be tried out on your own, such as capturing node screenshots and focusing functions are very useful.
If the element is referenced multiple times in the future in the console, it can be stored as a global variable.
That’s enough for today. See you next time. ๐ป If you find any errors in this article, please correct them in the comments section: ๐
First nuggets, release to other platforms 3 days later