Python crawler JS decryption details, learn to directly crack 80% of the website!!
29 Crawler project treasure Tutorials that you deserve!
preface
==Glidedsky== this close ==JS decryption == different from what I saw before, I hope you have a good look, learn!
== Warm tips == : Protect your hair!
1, web view
2, JS decryption process (carefully look at oh)
Since it is encrypted by JS, then the data must not be static, as follows
Request the page directly, or take the HTML code pasted into an HTML file to open without numbers
Open the console to view XHR
There is a problem here. I can check the data before, but I don’t know why I can’t see the data again. If you know, please let me know in the comment section, thanks.
What does online check say == web page can induction user opened console ==, ZA also do not know, ZA also dare not ask, there is such a hanging operation
How do you still do not understand the small partner, can refer to me this JS decryption articlePython crawler JS decryption details, write very detailed, this thing do much have experience
All right, let’s move on
Scroll down to see that the request takes three parameters
- Page: indicates the current number of pages
- T: It’s like a timestamp
- Sign: data that has been encrypted by some method
Hold Ctrl+Shift+ F to search, type ==sign==, there are 6 matches
==sign== == == == == == = According to my previous JS decryption experience, should not be a direct match to, and then make a function encryption o(╥﹏╥) O
All saw this, then directly give up is not my character, patience and then research research…
Then I found a new method, which I’m going to show you now, which is XHR breakpoint, as follows
Just copy some urls, not all of them
== Now comes the most critical step — using Python code to get the above data ==
== Get t value == == Get the sign value ==
Secure Hash Algorithm applies to Digital Signature Algorithm DSA defined in Digital Signature Standard DSS. SHA1 is more secure than MD5. For messages less than 2^ 64 bits in length, SHA1 produces a 160-bit message digest.
Don’t panic, python provides a hashlib library to solve this problem.
== Success, old iron people can wave praise! = = (* ^ del ^ *)
== Concatenates URL requests. Note that the returned data is in JSON format ==
perfect
3. Decrypt the answer (complete code)
import requests
import hashlib
import time
import math
headers = {
"user-agent": "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36".# Pay attention to the Cookie itself
"Cookie": ""
}
sum = 0
def get(response) :
global sum
for i in response['items'] :sum+ =int(i)
if __name__ == '__main__':
# 1000 pages
for i in range(1000) :Get t
t = math.floor(time.time())
Get the sign value
sha1 = hashlib.sha1()
data = 'Xr0Z-javascript-obfuscation-1' + str(t)
sha1.update(data.encode('utf-8'))
sign = sha1.hexdigest()
print("The first"+str(i+1) +"Page")
# splicing url
url = "http://glidedsky.com/api/level/web/crawler-javascript-obfuscation-1/items?page="+str(i+1) +"&t="+str(t)+"&sign="+str(sign)
response = requests.get(url=url,headers=headers).json()
get(response)
Print the final number
print(sum)
Copy the code
Pass successfully, decrypt successfully!!
== Note that Cookie is filled in. The code I provided did not fill in Cookie value ==
The blogger will continue to update, interested partners can == like ==, == attention == = and == = collection == oh, your support is the biggest motivation for my creation!
Java learning from beginner to master learning directory index
Bloggers open Source Python crawler Tutorial directory Index (Treasure tutorial, you deserve it!)