Author: xiaoyu
Python Data Science
Python data analyst
Somehow, wechat has become an indispensable part of us. Our social circles, news or public accounts, and personal information or privacy are all bound together. Since it is so important, if we can use crawler to simulate login, doesn’t it mean that we can get these information, and even view and manage them effectively according to needs? Yeah, that’s right, that’s totally okay. This blogger will share how to simulate logging in to the web version of wechat, and show the friends list information obtained after simulated logging in.
Wechat simulated login process is complicated, of course, no matter how the method is always the same, we still use Fiddler packet capture tool to simulate the login process. Ok, let’s walk through this complex process step by step.
Simulate the login request with Fiddler
First, we opened the wechat Web version in our browser (Fiddler had opened it earlier) and were presented with a QR code interface.
Then we use mobile wechat to scan and confirm, at this time the webpage version of wechat login.
All right, let’s go see what packets Fiddler grabbed for us. Since there are a lot of requests in the process, the captured packets are decomposed and analyzed one by one.
1. Open the wechat page
The packet capture of this step is like this. We found that the two links of login.wx.qq.com are what we need.
So I clicked on it to analyze it in detail.
The first link is a get request, and you can see that the URI carries the parameters appID, redirecT_URI, fun, lang, and _.
GET /jslogin? appid=wx782c26e4 c19acffb&redirect_uri=https%3A%2F%2Fwx.qq.com%2Fcgi-bin%2Fmmwebwx-bin%2Fwebwxnewloginpage&fun=new&lang=zh_CN&_=152035021 3674 HTTP / 1.1Copy the code
Appid, redirecT_URI, fun and lang parameters are all fixed after multiple fetching, while _ is a string of changing numbers. As we mentioned in the previous simulated jingdong shopping article, it is actually a timestamp. If not clear can review [Python crawler (four) | simulation of actual combat log in jingdong mall] [1]
Knowing these parameters, simulate get and send it. So why do we simulate this step?
This is because a visit to this link will result in the following response, which has the important information uUID that we need later (as described in the next steps).
window.QRLogin.code = 200; window.QRLogin.uuid = "Idf_QdW1OQ==";
Copy the code
2. Simulate obtaining two-dimensional code
The login method provided by wechat web page is scanning code, we can not avoid simulation, so we also need to scan code verification. Back in your browser, you can easily find the link to the QR code using developer tools.
https://login.weixin.qq.com/qrcode/AdgAWNry-w==
Copy the code
We find that the last string is changing. Wait, it’s exactly the same as the UUID. Yes, it is the UUID, which is used to ensure that the QR code is unique.
Therefore, we can get the TWO-DIMENSIONAL code picture by stitching the UUID extracted above to the back, and then scan the code to confirm the operation.
3. Identify the login status
In order to identify whether the scan was successful, we need to use the second link mentioned above for this step.
GET /cgi-bin/mmwebwx-bin/login? Loginicon = true&uuid = Idf_QdW8OQ = = & tip = 1 & r = 68288473 & _ = 1520050213675 HTTP / 1.1Copy the code
This link is also a GET request, which also carries some parameters.
In fact, in the process of packet capture, as long as we do not scan the QR code, this link will be repeatedly sent until the QR code is scanned or times out.
So how do we judge whether the QR code has been scanned or logged in?
It’s still based on the data of the response. After analysis, it is found that if the QR code has not been scanned, the response is like this:
window.code=408;
Copy the code
But if the QR code is scanned, the response looks like this:
window.code=201; window.userAvatar = ..... window.code=200; window.redirect_uri="https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxnewloginpage?ticket=AaL_Xd5muLPKNVY_Hzt_uoBs@qrticket_0& uuid=gbJqPdkNSQ==&lang=zh_CN&scan=1520353803";Copy the code
Code =201 indicates that the QR code is scanned successfully. Code =200 indicates that the login is successful.
4. Login
After scanning a QR code, Fiddler receives several new requests.
As you may have noticed in the previous step, there is a redirected URI after code=200. This URI is the login link to jump to in this step.
GET https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxnewloginpage?ticket=AaL_Xd5muLPKNVY_Hzt_udBs@qrticket_0&uuid=gbJqPdfNSQ==&lan G = zh_CN & scan = 1520353803 & fun = new&version = HTTP / 1.1 v2Copy the code
We can get all the parameters in the response by identifying the successful login response in the previous step. Yes, these parameters can be used in requests for formal logins (i.e., jump links). So we make another GET request with these parameters. The following parameters are carried:
Of course, the login request also returns some response code, which looks like this:
<error>
<ret>0</ret>
<message>OK</message>
<skey>xxx</skey>
<wxsid>xxx</wxsid>
<wxuin>xxx</wxuin>
<pass_ticket>xxx</pass_ticket>
<isgrayscale>1</isgrayscale>
</error>
Copy the code
It’s a bunch of parameters. It’s endless. Don’t worry, we’re almost there. To get the response we also need to extract all the parameters for the next request.
5. Initialize synchronization
Ok, finally to the last step, is the wechat initialization and synchronization request, initialization information link is as follows:
POST https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxinit?r=64629109&pass_ticket=4dU5IS9EqtXt5cIV2Gni1tKG7m2V56PXk5XI%252BdjdrIk%2 53 d HTTP / 1.1Copy the code
Contact Contact link is as follows:
GET https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxgetcontact?pass_ticket=4dU5IS9EqtXt5cIV2Gni1tKG7m2V56PXk5XI%252BdjdrIk%253D&r = 1520353806102 & seq = 0 & skey = @ crypt_a82dd73a_3885c878ae2f4590f7b2b5ee949dd1bd HTTP / 1.1Copy the code
The parameters pass_ticket and skey in the URI have been obtained in the previous response. You can send the request directly. From the responses of these two links, we can get some real useful information.
There is also a synchronous request link where the required parameters can be extracted from the above two link responses. But so far we have the information we want from the two links above, so we don’t have to request this synchronous link.
GEThttps://webpush.wx.qq.com/cgi-bin/mmwebwx-bin/synccheck?r=1520353806125&skey=%40crypt_a82dd73a_3885c878ae2f4590f7b2b5 ee949dd1bd&sid=O2Se5s2LJzPebME2&uin=254891255&deviceid=e289448639092966&synckey=1_694936977%7C2_694936979%7C3_694936982% 7 c1000_1520324882 & _ = 1520353793581 HTTP / 1.1Copy the code
The basic login process is like this, a little complicated, the blogger summarized a flow chart for reference.
Code implementation
Request emulation is done using the Requests module, and parsing is done using RE. Verify =False: verify=False: verify=False: verify=False
1. Initialize parameters
Def __init__(self): self.session = requests. Session () self.headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 5.1; Rv :33.0) Gecko/20100101 Firefox/33.0'} self.qrimgPath = os.path.split(os.path.realPath (__file__))[0] + os.sep + 'webWeixinQr.jpg' self.uuid = '' self.tip = 0 self.base_uri = '' self.redirect_uri = '' self.skey = '' self.wxsid = '' self.wxuin = '' self.pass_ticket = '' self.deviceId = 'e000000000000000' self.BaseRequest = {} self.ContactList = [] self.My = [] self.SyncKey = ''Copy the code
Define a class that initializes all the request parameters of the instance and defines the path to the QR code.
2. Request a uuid
def getUUID(self): url = 'https://login.weixin.qq.com/jslogin' params = { 'appid': 'wx782c26e4c19acffb', 'redirect_uri': 'https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxnewloginpage', 'fun': 'new', 'lang': 'zh_CN', '_': Int (time.time() * 1000), # timestamp} Response = self.session.get(url, params=params) target = response.content.decode('utf-8') pattern = r'window.QRLogin.code = (\d+); window.QRLogin.uuid = "(\S+?) "' ob = re.search(pattern, target) # select uuid code = ob.group(1) self.uuid = ob.group(2) if code == '200': Return True return FalseCopy the code
Extract the corresponding UUID using the re, and determine whether the request is successful by code. The response is as follows:
window.QRLogin.code = 200; window.QRLogin.uuid = "Idf_QdW1OQ==";
Copy the code
3. Simulate obtaining two-dimensional code
def showQRImage(self): url = 'https://login.weixin.qq.com/qrcode/' + self.uuid response = self.session.get(url) self.tip = 1 with Open (self.qrimgpath, 'wb') as f: f.write(response.content) f.lose () # open qr code if sys.platform.find(' Darwin ') >= 0: Subprocess. call(['open', self.qrimgPath]) # elif sys.platform.find(' Linux ') >= 0: Subprocess. call(['xdg-open', self.qrimgPath]) # else: Os. startfile(self.QRImgPath) # Windows 系统 print(' please use wechat to scan qr code ')Copy the code
Request qr code picture with UUID and open it automatically according to operating system.
4. Identify the login status
def checkLogin(self): url = 'https://login.weixin.qq.com/cgi-bin/mmwebwx-bin/login?tip=%s&uuid=%s&_=%s' % ( self.tip, self.uuid, int(time.time() * 1000)) response = self.session.get(url) target = response.content.decode('utf-8') pattern = r'window.code=(\d+); ' ob = re.search(pattern, target) code = ob.group(1) if code == '201': Self. tip = 0 elif code == '200': self.tip = 0 elif code == '200': ') regx = r'window.redirect_uri="(\S+?) "; ' ob = re.search(regx, target) self.redirect_uri = ob.group(1) + '&fun=new' self.base_uri = self.redirect_uri[:self.redirect_uri.rfind('/')] Elif code == '408': # timeout pass return codeCopy the code
The response is as follows:
window.code=200;
window.redirect_uri="https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxnewloginpage?ticket=AaL_Xd5muLPKNVY_Hzt_uoBs@qrticket_0&
Copy the code
Identify the login status by the code in the response. 408: Timed out 201: scanned 200: logged in
5. Login
def login(self): response = self.session.get(self.redirect_uri, verify=False) data = response.content.decode('utf-8') doc = xml.dom.minidom.parseString(data) root = doc.documentElement For node in root.childNodes: if node.nodeName == 'skey': self.skey = node.childNodes[0].data elif node.nodeName == 'wxsid': self.wxsid = node.childNodes[0].data elif node.nodeName == 'wxuin': self.wxuin = node.childNodes[0].data elif node.nodeName == 'pass_ticket': self.pass_ticket = node.childNodes[0].data if not all((self.skey, self.wxsid, self.wxuin, self.pass_ticket)): return False self.BaseRequest = { 'Uin': int(self.wxuin), 'Sid': self.wxsid, 'Skey': self.skey, 'DeviceID': self.deviceId, } return TrueCopy the code
Request the login link for the jump, extract the response code parameter, and the response is as follows:
<error>
<ret>0</ret>
<message>OK</message>
<skey>xxx</skey>
<wxsid>xxx</wxsid>
<wxuin>xxx</wxuin>
<pass_ticket>xxx</pass_ticket>
<isgrayscale>1</isgrayscale>
</error>
Copy the code
6. Initialize the information
def webwxinit(self): url = self.base_uri + \ '/webwxinit? pass_ticket=%s&skey=%s&r=%s' % ( self.pass_ticket, self.skey, int(time.time() * 1000)) params = { 'BaseRequest': self.BaseRequest } h = self.headers h['ContentType'] = 'application/json; charset=UTF-8' response = self.session.post(url, data=json.dumps(params), headers=h, verify=False) data = response.content.decode('utf-8') print(data) dic = json.loads(data) self.ContactList = dic['ContactList'] self.My = dic['User'] SyncKeyList = [] for item in dic['SyncKey']['List']: SyncKeyList.append('%s_%s' % (item['Key'], item['Val'])) self.SyncKey = '|'.join(SyncKeyList) ErrMsg = dic['BaseResponse']['ErrMsg'] Ret = dic['BaseResponse']['Ret'] if Ret ! = 0: return False return TrueCopy the code
Request initialization link to get initialization response data.
def webwxgetcontact(self): url = self.base_uri + \ '/webwxgetcontact? pass_ticket=%s&skey=%s&r=%s' % ( self.pass_ticket, self.skey, int(time.time())) h = self.headers h['ContentType'] = 'application/json; charset=UTF-8' response = self.session.get(url, headers=h, verify=False) data = response.content.decode('utf-8') # print(data) dic = json.loads(data) MemberList = Dic ['MemberList'] # delete from dic['MemberList'] SpecialUsers = ["newsapp", "fmessage", "filehelper", "weibo", "qqmail", "tmessage", "qmessage", "qqsync", "floatbottle", "lbsapp", "shakeapp", "medianote", "qqfriend", "readerapp", "blogapp", "facebookapp", "masssendapp", "meishiapp", "feedsapp", "voip", "blogappweixin", "weixin", "brandsessionholder", "weixinreminder", "wxid_novlwrv3lqwv11", "gh_22b87fa7cb3c", "officialaccounts", "notification_messages", "wxitil", "userexperience_alarm"] for i in range(len(MemberList) - 1, -1, -1): Member = MemberList[i] if Member['VerifyFlag'] & 8 ! Memberlist. remove(Member) elif Member['UserName'] in SpecialUsers: # select memberlist. remove(Member) elif Member['UserName'].find('@@')! Member['UserName'] == self.My['UserName']: # select MemberList. Remove (Member) return MemberListCopy the code
Ask for contact links to get contacts, public accounts, group chats, and personal information. The response code is in JSON format as follows:
{ "BaseResponse": { "Ret": 0, "ErrMsg": "" } , "Count": 11, "ContactList": [{ "Uin": 0, "UserName": "Filehelper ", "NickName": "file Transfer helper", "HeadImgUrl": "/cgi-bin/mmwebwx-bin/webwxgeticon?seq=621637626&username=filehelper&skey=@crypt_a82dd73a_7e8e1054c011e8d71d0b542f39c7db 85", "ContactFlag": 3, "MemberCount": 0, "MemberList": [], "RemarkName": "", "HideInputBarFlag": 0, "Sex": 0, "Signature": "", "VerifyFlag": 0, "OwnerUin": 0, "PYInitial": "WJCSZS", "PYQuanPin": "wenjianchuanshuzhushou", "RemarkPYInitial": "", "RemarkPYQuanPin": "", "StarFriend": 0, "AppAccountFlag": 0, "Statues": 0, "AttrStatus": 0, "Province": "", "City": "", "Alias": "", "SnsFlag": 0, "UniFriend": 0, "DisplayName": "", "ChatRoomId": 0, "KeyWord": "fil", "EncryChatRoomId": "", "IsOwner": 0 } ,{...} ...Copy the code
Do information operation according to the field information in the response. Here is to get the friend list, so other fields such as public account, group chat and themselves are removed, and only the friend information is retained.
7. Main function run
def main(self): if not self.getUUID(): Print (' uUID failed ') return self.showqrimage () time.sleep(1) while self.checklogin ()! = '200': pass os.remove(self.QRImgPath) if not self.login(): If not self.webwxinit(): print(' login failed ') return Return MemberList = self.webwxGetContact () print(' %s ') for x in MemberList: = 'unknown' sex if x [' sex '] = = 0 else 'male' if [' sex '] = = 1 x else 'female' print (' NickName: % s, gender: % s, note: % s, signature: % s' % (x [' NickName ']. sex, x['RemarkName'], x['Signature']))Copy the code
Simulated login results
The friends list is as follows:
Of course, friend lists are just one example, but other information can also be viewed and managed or analyzed.
conclusion
This article shares the simulated login process of web-based wechat. Although the request in the process is a bit complicated, but as long as we carefully analyze or can be implemented step by step, I hope to help you, the code has been uploaded to Github: link
To complete.
Follow the official wechat account Python Data Science to obtain 120G artificial intelligence learning materials.