Python Crawler (new)

Environment: python3.x External dependencies package: Requests Github project address

Main issues: Simulated login:

Zhihu now uses HTTPS request, data encryption, but the problem is not big, the important thing is that the web data is changed, and the background will make some judgment on crawler during the request, so every request needs to add request header, as close as possible to the appearance of browser request.

Get to the point

The log-in data is still the same

a.png

Check out chrome Developer Tools.

But sometimes need to use captcha, so my crawler directly download the authentication code to the local, want to try authentication code automatic identification of students, to my another article simple verification code recognition, although the article only speak digital verification code recognition, but the basic idea is the same, but on zhihu verification code recognition is a bit difficult at the moment).

_xsrf this data can be found on the login page, although it has been changed, but can still be found through the re.

The login function looks something like this:

Paste_Image.png

Function login for the first time, need account and password, after, the program will automatically record a cookie file, in the current folder, the next run program does not need to enter the password and account, read and write cookie file program roughly like this:

Paste_Image.png

From python_ZHIhu import ZhiHu zh=ZhiHu() En. Get_answer_text (' url for a question ') // This method will download the well-liked text answers to a question, store them in a TXT file // Download all the pictures for a question: En. Get_answer_img (' url for a question ') // This method downloads all the images in the answers to a question and categorize them by the person's nicknameCopy the code

The current update is only to ensure the normal crawling zhihu, later may add more functions, please look forward to….

star,please……

Source: github.com/ladingwu/py…

Related Posts

Java: Concurrency is not easy

A method for estimating the number of concurrent users

Linux install nginx