This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
- 👻👻 I believe that many friends can develop their own crawler projects independently after being bombarded by my last few blog posts on crawler technology!! The road of reptiles has been opened! 👻 👻
- 😬😬 but a few days ago a fan VX asked me this question: “The source code I see in the browser through the developer tools is completely different from the source code I crawled down through the Requests library! What’s going on here? Through the blogger you teach the method can not solve ah!” 😬 😬
In fact, this involves front-end knowledge, but MY energy time is limited, so currently only updated an HTML essential knowledge article, pay attention to this blogger – will work hard to continue to update CSS and JavaScript related knowledge of the article oh! 💦 As a reptilian must also know the front-end knowledge of the first two HTML explanations. It will be out soon!
- ⏰⏰ We need to understand why this is the case before we can do something about it. The first thing to remember about Requests is that they were retrieving raw HTML documents, whereas the page in the browser was the result of JavaScript processing of data from a variety of sources, whether loaded via Ajax or included in the HTML document. It can also be generated by JavaScript and a specific algorithm. ⏰ ⏰
For the first case: Data loading is a kind of asynchronous loading way, the original page will contain some data first, after the original page loading, will request an interface to get the data from the server, and then the data will be processed and appear on the page, it is actually send an Ajax request (this is a situation of JavaScript to render the page!) ;
In the third case, the data load is generated using JavaScript and a specific algorithm, not raw HTML code, and there is no Ajax request involved.
- 📻📻 principle known, the following question is how we in the end to solve? 📻 📻
|