Why write this article? (Mainly because some people in the QQ group are simulating login of Zhihu, but it has not been successful.) Then I caught the bag and found that the login page of Zhihu has been revised, and the difficulty has been greatly increased.
Start caught
First of all, open the home page of Zhihu again, and then enter your account password to log in (remember to enter the wrong password).
So we can see the request header (below)
We noticed that there were several request headers that were different from the normal ones (red box)
authorization
Feel should be js generated, see laterContent-Type
Boundary = XXX boundary= XXXcookie
Note that the cookie is not empty before login, indicating that there must be a set-cookie operation before loginx-udid,x-sxrftoken
Both of these are validation parameters, which can be found in the web source code
Look again at the request parameters
You can see that the parameter is payload
It’s the first time I’ve seen someone like this
This should be combined with the request
Content-Type:
multipart/form-data; boundary=----WebKitFormBoundary2KNsyxgtG28t93VF
Copy the code
To watch
Multipart /form-data is a form submission method and boundary= XXX is a form division method. Look at a simple example to see why
— — — — — – WebKitFormBoundary2KNsyxgtG28t93VF is the division of different parameters, so you can directly from him (this is decided by the boundary behind the content-type of the above, feel free to modify)
After the remove the line, the above is equivalent to client_id = c3cef7c66a1843f8b3a9e6a1e3160e20,
grant_type=password
.
So this payload is pretty easy to understand.
So let’s see what the parameters are
There are a lot of parameters. You can see that many parameters are fixed, such as account number, password, timestamp, etc
There are two changes in client_id,signature
Start looking for parameters
Authorization:
In Chrome, we can directly press CTRL + Shift + F (global search, js search, CSS search, etc.), we can see that the search has been found, and it is directly written in JS. Then we can change the account randomly and capture the package again, and find that the authorization value is still fixed. So it shows that the authorization is directly written in JS, not dynamically modified (then we have found the value of authorization).
Cookies:
Before login, we found that the value of cookie is not empty, indicating that there must be set-cookie operation after opening the web page. If we want to verify, we should first open a non-trace browser (mainly to empty all the previous cookies to avoid interference), and then open zhihu.com. We found him doing a couple of set-cookies
So if we want to emulate, the easy way is to use requests. Session directly
x-udid,x-sxrftoken
:
Generally this validation parameters will be in the web source code, so directly look at the web source code
You can see that it’s already there, and the next thing is how do you find it, you can use the regex, you can use xpath to locate it, right
client_id
:
You will notice that client_ID is exactly the same as authorization above
signature
:
Again, CTRL + Shift + F global search
Found found, but the parameter is js generated dynamically…
Basically figure out how to encrypt it, and then use Python to simulate it, okay
Step 1: Download the JS and format it (to make the code look nice)
Step 2: Replace the js with a string and use the js you just formatted
Step 3: Debug slowly… Until we figure out how to generate…
This is the general procedure
But if your JS is as bad as mine, you can just find the encrypted js and Python will execute it…
Up here, we’ve found all the parameters we need to find, and then we just simulate sending
Please pay attention to the code of wechat public account [Python crawler share], send “Zhihu login code” to see ~~~