“This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!”
Preface — a few days ago, I that go up junior high school’s younger sister sends VX to ask me suddenly to say she wants to copy a few friends circle copy that search to arrive on the net to take send friend circle, but the problem is to copy not!
I chuckled at the question (thinking: is there any data on the Internet that my crawler can’t crawl? Does the younger sister have not heard a legend that the river’s lake spreads — can be seen to climb? I swished out of bed, sat in front of my computer, opened my Google browser and typed in my sister’s url — and sure enough:
This is the familiar popover, and this is the damn VIP privilege to enjoy, but — these are all small problems for us crawlers, I open my PyCharm, bang bang, a little while a little crawler wrote for my sister’s website, input the url, download OK:
Later, I will download and organize the TXT text directly to the sister, sister get a good brother brother kua – the body bones are to crisp! However, IT occurred to me that the next time she encountered a similar problem, she would have to ask me to solve it for her! “No, no, no,” I told myself — IT’s a no-no! Give a man a fish, give him a fish – this is the way!! But what is this “fishing”?
I won’t keep you in suspense! I’m going to share with you a little bit of an operation — just use a Google browser (whether you’re a kid or an uncle or an aunt) and follow the simple steps below to make sure you’re free to copy whatever you want!
Step 1: Click the right mouse button in the blank area of the page -> then click “Check”;
Step 2: Click the gear icon in the upper right corner of the page.
Step 3: Scroll down to see Disable JavaScript. Click on the blank box in front and select it.
End, now – you have unsealed the page and can copy whatever you want!
However, if you’re a programmer, or if you want to be one in the future, it’s not enough to be able to do just that one browser. I stayed up all night to compile the following information about how to use the Google Chrome debug panel and the common keyboard shortcuts. If you can master them all, then congratulations: you are a very powerful programmer.
1.Chrome debugging panel
(1) Common panel (crawler positioning elements must be used!)
-
Position the small arrow button (first on the left) :
Select the Elements panel and launch the button to locate the source location of the corresponding element on the page, or select the source location to locate the corresponding element on the page.
-
Mobile-pc View toggle button (second from the left) :
When this button is activated, the web page can be converted between the PC web page and the mobile web page. Because in the process of crawler, it is relatively easy to crawl mobile web site, so you can switch the web page to mobile web page through this button to achieve faster crawling operation.
-
Elements Panel
This panel displays all the rendered HTML source code, which can be used to find the location, properties, and other characteristics of each tag when you use Selenium to crawl a web page. More importantly, double-clicking the HTML source code or the CSS on the right can change the look of the web page, which means you can debug static web pages.
-
Console Panel shortcut keys: CTRL+~
This panel displays log information during the page loading process, including print, warning, error and other displayable information. It is also a JS interactive console.
-
Sources Panel (Source panel)
This panel is grouped by site and holds all the requested resources (HTML, CSS, JPG, GIF, JS, etc.). Because this panel holds all the resources, this is where the object code will be found when debugging JS. The panel also provides a debugging button tool.
-
Network Panel
The Network panel records the detailed information of the Network request, including request header, response header, form data, parameter information, etc.
-
Shortcut key small learning (to check the page input oh!) : CTRL + SHIFT + P
Input javascript (you can directly select the Disabled javascript option) : you can block this site JS code, after the refresh of this site JS code will not execute! Type full: you can take a screenshot (it will take the whole page)
(2) Network panel (it is required to filter requests and data types in crawlers — such as filtering requests loaded asynchronously!)
-
ALL: ALL requests
-
XHR(XmlHttpRequest object generated by JS) : JS dynamically loads the request
-
JS: JS code
-
Style of Css:
-
Image: image
-
Media: audio, video
-
The Font, Font
-
DOC: the home page
-
WS: WebSocket
-
Hide data URLs: You can filter responses to data
-
Note:
(1) If the Preserve log option is selected in the upper left corner, the data requested on the last page will not be cleared. For example: in a web page login, if you do not check this option, because click login before belong to a request; After clicking login, it belongs to another request. So click after is not your login information!
(2) The Disable cache option in the upper left corner of the page indicates to clear the cache, which should be checked to prevent the presence of the local cache during web page operation, resulting in some unexpected errors!
(3) The box Filter in the upper left corner.
Usage:
① To filter the response from domain name baidu.com, it is easy for you to find cookies. ②set-cookie-name: indicates the key in a cookie. You can also filter responses that contain this key so you can find cookies. ③set-cookie-value: indicates the value in a cookie. You can also filter responses that contain this value so you can find cookies. (4) cookie-name: indicates the key in a cookie. Requests that contain the key for this cookie can be filtered.Copy the code
(3) Set breakpoints (crawler advanced JS penetration must use the operation!)
Part ONE: How to Use it!
Purpose: Find the place where the target data is generated by debugging (must be used for JS penetration!)
Use breakpoints to pause JavaScript code and review the values of variables and the stack that was called at a particular point in time.
The most basic way to set a breakpoint is to manually add a breakpoint on a particular line of code. You can also configure these breakpoints to fire only when certain conditions are met.
On the left side of the source code, you can see the line number. This area is called the Line number gutter. Clicking a line number in the line number slot adds a breakpoint on that line of code. For example, events, DOM changes.
Part two: Step by step Debugging!
Part three: Scope!
When the script breaks, the Scope pane displays all currently defined properties at the current time.
Part four: the call stack!
- Near the top of the sidebar is the Call Stack pane. When the code is paused at the breakpoint, the CallStack pane shows the execution path, in reverse chronological order, bringing the code to the breakpoint. This helps to understand where the execution is now and how it got there, and is an important factor in debugging.
- To call the function chain, let’s call the above function
2.Chrome shortcuts
(1) Tabs and window shortcuts (key: common!)
operation | shortcuts |
---|---|
Open a new window | Ctrl + n |
Open a new window in non-trace mode | Ctrl + Shift + n |
Open a new TAB and jump to that TAB | Ctrl + t |
Reopen the last closed TAB and jump to that TAB | Ctrl + Shift + t |
Skip to the next open TAB | Ctrl + Tab or Ctrl + PgDn |
Skip to the last open TAB | Ctrl + Shift + Tab or Ctrl + PgUp |
Jump to a specific TAB | Ctrl + 1 to Ctrl + 8 |
Skip to the last TAB | Ctrl + 9 |
Open the home page in the current TAB | Alt + Home |
Opens the previous page recorded in the current TAB browsing history | Alt + left arrow key |
Opens the next page recorded in the current TAB browsing history | Alt + the right arrow key |
Close the current TAB | Ctrl + W or Ctrl + F4 |
Close all open tabs and browsers | Ctrl + Shift + w |
Minimize the current window | Alt + space bar + N |
Maximize the current window | Alt + space bar + x |
Close the current window | Alt + F4 |
Exit the Google Chrome | Ctrl + Shift + q |
(2) Google Chrome shortcuts
(3) Web shortcut keys
3.In The End
The above knowledge points are mostly some simple operation commands, the typical kind of content to read and forget the type. So take some advice from some programming gurus: read it and forget it, forget it, use it and forget it — keep watching!!
About the author: Loner, devoted to python crawler and Web (love Django, Flask, Tornado) for 30 years! Welcome to follow the same name CSDN blogger name [Lonely], as well as the newly created wechat public account [Lonely], adhere to the output of dry goods, if learning questions welcome to exchange!