preface
Text and pictures from the network, only for learning, exchange, do not have any commercial purposes, copyright belongs to the original author, if you have any questions, please contact us to deal with.
Author: Python community
PS: If you need Python learning materials, please click on the link below to obtain them
http://note.youdao.com/noteshare?id=3054cce4add8a909e784ad934f956cefCopy the code
After many attempts, simulated login taobao finally succeeded, it is not easy, taobao login encryption and verification is too complex, painstakingly, here to write out and share with you, I hope you support.
Warm prompt
Now Taobao has changed to slider verification, it is more difficult to solve this problem, the following code can not be used, only for learning reference research.
This content
-
Python simulation login taobao web page
-
Get all order details for the logged-in user
-
Learn how to deal with captcha situations
-
Experience the complexity of the simulated login mechanism
Explore some achievements
-
Taobao uses AES encryption algorithm to encrypt the password, and finally converts the password to 256 bits. During POST, the password is 256 bits long.
-
Taobao had to enter a verification code when logging in, and after several failed attempts it finally got a picture of the code for users to manually enter for verification.
-
In addition, Taobao has a complicated and changing UA encryption algorithm every day. In the program, we need to obtain a CERTAIN UA code in advance to simulate login.
-
Multiple requests and regular expression extraction are required to obtain the last login ST code, and the ST code can only be used once.
Overall thinking sorting
-
Manually go to the browser to obtain the UA code and the encrypted password, only get once, once and for all.
-
Send login request to login interface, POST a series of parameters, including UA code and password, etc., get response, extract verification code image.
-
The user enters the manual verification code, adds the verification code, and sends a request in POST mode to obtain the response and extract the J_Htoken.
-
J_Htoken is used to send a request to Alipay, obtain a response, and extract the ST code.
-
Using st code and user name, re-issue login request, get response, extract redirected URL, store cookie.
-
Cookies are used to make requests to other personal pages, such as order pages, to obtain responses and extract order details.
Don’t you get it? Ok, I will explain a little bit of their simulation login process, I hope you can understand.
preparation
Because the UA algorithm and AES password encryption algorithm of Taobao are too complex, the UA algorithm in Taobao is changing every day. However, you can always use this content after obtaining it. After testing, there is no problem, once and for all.
How to obtain ua and AES passwords?
Let’s get it directly from the browser, open the browser, find the login interface of Taobao, press F12 or right-click the browser to review elements.
In this case, I’m using Firefox, so be sure to set up a persistent log in your browser, otherwise you won’t be able to see what you’ve captured. Here is the screenshot belowOk, so let’s get the UA and AES passwords from the browser
Click on the Network TAB, it’s empty, no data is captured. Try it on the web, enter your username, password, or verification code if necessary, and click login.After the jump is successful, you can see a lot of logs. Click on the line login.taobo.com and look at the parameters. You can see the form data, which includes ua and password2 below. This is the UA we need and the AES encrypted password.
Enter the verification code and obtain the J_HToken
The code is as follows:
import urllib
import urllib2
import cookielib
import re
import webbrowser
# Simulate login taobao class
class Taobao:
# init method
def __init__(self):
# login URL
self.loginURL = "https://login.taobao.com/member/login.jhtml"
# proxy IP address to prevent your own IP from being blocked
self.proxyURL = 'http://120.193.146.97:843'
# Header message sent when logging in to POST data
self.loginHeaders = {
'Host':'login.taobao.com'.'User-Agent' : 'the Mozilla / 5.0 (Windows NT 6.1; WOW64; The rv: 35.0) Gecko / 20100101 Firefox 35.0 / '.'Referer' : 'https://login.taobao.com/member/login.jhtml'.'Content-Type': 'application/x-www-form-urlencoded'.'Connection' : 'Keep-Alive'
}
# username
self.username = 'cqcre'
# UA string, calculated by Tao Bao UA algorithm, contains time stamp, browser, screen resolution, random number, mouse movement, mouse click, in fact, keyboard input records, mouse movement records, click records and other information
self.ua = '191UW5TcyMNYQwiAiwTR3tCf0J/QnhEcUpkMmQ=|Um5Ockt0TXdPc011TXVKdyE=|U2xMHDJ+H2QJZwBxX39Rb1d5WXcrSixAJ1kjDVsN|VGhXd1llXGNaY FhkWmJaYl1gV2pIdUtyTXRKfkN4Qn1FeEF6R31TBQ==|VWldfS0TMw8xDjYWKhAwHiUdOA9wCDEVaxgkATdcNU8iDFoM|VmNDbUMV|V2NDbUMV|WGRYeCgGZ htmH2VScVI2UT5fORtmD2gCawwuRSJHZAFsCWMOdVYyVTpbPR99HWAFYVMpUDUFORshHiQdJR0jAT0JPQc/BDoFPgooFDZtVBR5Fn9VOwt2EWhCOVQ4WSJPJ FkHXhgoSDVIMRgnHyFqQ3xEezceIRkmahRqFDZLIkUvRiEDaA9qQ3xEezcZORc5bzk=|WWdHFy0TMw8vEy0UIQE0ADgYJBohGjoAOw4uEiwXLAw2DThu9a== |WmBAED5+KnIbdRh1GXgFQSZbGFdrUm1UblZqVGxQa1ZiTGxQcEp1I3U=|W2NDEz19KXENZwJjHkY7Ui9OJQsre09zSWlXY1oMLBExHzERLxsuE0UT|XGZGF jh4LHQdcx5zH34DRyBdHlFtVGtSaFBsUmpWbVBkSmpXd05zTnMlcw==|XWdHFzl5LXUJYwZnGkI/VitKIQ8vEzMKNws3YTc=|XmdaZ0d6WmVFeUB8XGJaYEB 4TGxWbk5yTndXa0tyT29Ta0t1QGBeZDI='
# Password, the real password cannot be entered here, taobao has encrypted the password, 256 characters, here is the encrypted password
self.password2 = '7511aa68sx629e45de220d29174f1066537a73420ef6dbb5b46f202396703a2d56b0312df8769d886e6ca63d587fdbb99ee73927e8c07d9c88cd021 82e1a21edc13fb8e140a4a2a4b5c253bf38484bd0e08199e03eb9bf7b365a5c673c03407d812b91394f0d3c7564042e3f2b11d156aeea37ad6460118 914125ab8f8ac466f'
self.post = post = {
'ua':self.ua,
'TPL_checkcode':' '.'CtrlVersion': '1,0,0,7'.'TPL_password':' '.'TPL_redirect_url':'http://i.taobao.com/my_taobao.htm?nekot=udm8087E1424147022443'.'TPL_username':self.username,
'loginsite':'0'.'newlogin':'0'.'from':'tb'.'fc':'default'.'style':'default'.'css_style':' '.'tid':'XOR_1_000000000000000000000000000000_625C4720470A0A050976770A'.'support':'000001'.'loginType':'4'.'minititle':' '.'minipara':' '.'umto':'NaN'.'pstrong':'3'.'llnick':' '.'sign':' '.'need_sign':' '.'isIgnore':' '.'full_redirect':' '.'popid':' '.'callback':' '.'guf':' '.'not_duplite_str':' '.'need_user_id':' '.'poy':' '.'gvfdcname':'10'.'gvfdcre':' '.'from_encoding ':' '.'sub':' '.'TPL_password_2':self.password2,
'loginASR':'1'.'loginASRSuc':'1'.'allp':' '.'oslanguage':'zh-CN'.'sr':'1366 * 768'.'osVer':'Windows | 6.1'.'naviVer':'firefox|35'
}
# convert POST data to code
self.postData = urllib.urlencode(self.post)
# set proxy
self.proxy = urllib2.ProxyHandler({'http':self.proxyURL})
# set the cookie
self.cookie = cookielib.LWPCookieJar()
Set the cookie handler
self.cookieHandler = urllib2.HTTPCookieProcessor(self.cookie)
Openopenopenopener = urllib2.urlopen
self.opener = urllib2.build_opener(self.cookieHandler,self.proxy,urllib2.HTTPHandler)
The response to this request is sometimes different, sometimes validation is required and sometimes not
def needIdenCode(self):
Build request on the first login attempt to obtain the verification code
request = urllib2.Request(self.loginURL,self.postData,self.loginHeaders)
Get corresponding to the first login attempt
response = self.opener.open(request)
# Get the content
content = response.read().decode('gbk')
Get status
status = response.getcode()
The status code is 200
if status == 200:
print u"Request received successfully"
The words #\ \ \ u8F93 \ U5165 \u9a8c\ U8bc1 \u7801 are the UTF-8 encoding for the verification code
pattern = re.compile(u'\u8bf7\u8f93\u5165\u9a8c\u8bc1\u7801',re.S)
result = re.search(pattern,content)
# If the character is found, a verification code is required
if result:
print u"This security verification is abnormal, you need to enter the verification code."
return content
# Otherwise not necessary
else:
print u"This security verification is passed, you do not need to enter the verification code this time."
return False
else:
print u"Obtaining request failed"
# Get captcha image
def getIdenCode(self,page):
# Get a picture of the captcha
pattern = re.compile(',re.S)
The result of the match
matchResult = re.search(pattern,page)
# Have matched the content, and captcha image link is not empty
if matchResult and matchResult.group(1):
print matchResult.group(1)
return matchResult.group(1)
else:
print u"No captcha content found"
return False
# Program run trunk
def main(self):
Check whether a captcha is required. If yes, return False
needResult = self.needIdenCode()
if not needResult == False:
print u"You need to manually enter the verification code."
idenCode = self.getIdenCode(needResult)
# get the link to the captcha
if not idenCode == False:
print u"Verification code obtained successfully"
print u"Please enter the verification code you see in your browser."
webbrowser.open_new_tab(idenCode)
Verification code link is empty, invalid verification code
else:
print u"Failed to obtain captcha. Please try again."
else:
print u"No captcha required."
taobao = Taobao()
taobao.main()Copy the code
Obtain st using J_HToken
# Get st via token
def getSTbyToken(self,token):
tokenURL = 'https://passport.alipay.com/mini_apply_st.js?site=0&token=%s&callback=stCallback6' % token
request = urllib2.Request(tokenURL)
response = urllib2.urlopen(request)
# Process ST to obtain the login address of the user's Taobao home page
pattern = re.compile('{"st":"(.*?) "} ',re.S)
result = re.search(pattern,response.read())
If the match is successful
if result:
print u"St code obtained successfully"
Get the value of st
st = result.group(1)
return st
else:
print u"No match for ST"
return FalseCopy the code
Final run result
.