In the future era of artificial intelligence (AI), no one may know whether it is a person or a computer that uses a computer. After reading this article, as long as your QQ online, a lot of information related to it can let the program automatically help you get, do not have to open any page, then call this silent access.

Junior PHP engineers battle thousand QQ group members information acquisition

In the previous article, the login operation was manually completed by clicking the mouse, and the BKN and Cookie generated were copied so that the program could directly reach the page after login and obtain the information of all members of the group. But it always needs to be set up so that the program becomes a semi-automatic work.

[Get request header information – GIF]

The beginning of the story

Having access to data was cool enough for beginners, but engineers couldn’t stop there — I didn’t even click on a mouse, so I started full automation.

Looking for abuse? Who says…

In the repeated operation of manual login to copy cookies, engineer Niuniu caught inspiration and began to think in his heart: if it is the login point, the program can also be; For the server, it cannot determine whether the operation was performed by a human or a machine; So HTTPS protocol won’t prevent the application from obtaining identity authorization? The answer is no, HTTPS just keeps the data secure, but the program can act exactly like a client, mimicking the source of the request, which has nothing to do with HTTPS in theory — the arrow seems to be on the string.

Out of consideration of the training of the new man, the old cow still wants to ask what solution xiaobai has. Xiaobai thought for a few minutes after beaming, is about to jump up.

We can use CURL to capture the login page and write some JavaScript so that the front end can simulate mouse clicks and capture the jump.

This kid is underestimating the enemy again, simply talking about his own realization of pain. After all, this industry needs some confidence. At present, 90% of the code farmers in CHN have not lived over 30 years old. From time to time, they even kill one or two programmers to heaven for product launches.

Lao Niu: Your idea is quite creative, but there may be two difficulties. Curl does not generate a complete dom structure of the document. Another is that even if JS can simulate jumps, the content generated by page self-refresh may also be in a large conflict domain, which can be a very tricky problem.

Small white: that I still try by this train of thought.

Try, do try. But for now, just log in QQ on my computer.

The next moment of operation let the small white mouth, the original cow has achieved automatic enrollment data, it seems to be just one click operation, the twists and turns of which only the cow himself knows. At the moment the little white darling listen to the old cow, moved a small bench in the side quietly listen. The old cow anticipated that this explanation would be extremely troublesome, so he let xiao Bai buy a can of Red Bull in advance. This guy — made a case.

ⅰ. Fog dissipates

Why can get the current running QQ profile picture on the page. An F5 refresh recreates the first visit. Using observations from browser F12, we looked for clues in the mixed HTTP requests that were close to the truth — the desktop QQ application was also a local server.

ⅱ. Knowledge of combat readiness

1. HTTP protocol. Request, response, distinction between Cookie and set-cookie, status code 200, 302, etc

2, the Fiddler. HTTP proxy tool used to capture requests and responses in the whole login process for analysis.

3. Browser F12 developer debugging.

[Fidder view the full login process – GIF]

ⅲ. Final Fantasy

Pt_local_tk $token = getToken(); $uin = pt_get_uins($token); $clientKey = pt_get_st($uin,$token); RichURL $arr = jump($token,$uin,$clientkey); $skey = $arr['skey']; $richURL = $arr['url']; $p_skey =getPSkey($richURL,$skey); $BKN = getBkn($skey); $Cookie = "Cookie: p_skey={$p_skey}; uin=o{$uin} ; skey={$skey}"; $json = getGroupList($bkn,$Cookie); [source download] - a. PHP file without any configuration run directly at https://github.com/nasaplayer/getCurrentQQGroupListCopy the code

The above code contains all of our main flow, just a few lines of code. Please also pay attention to parameter transfer, the core logic is in it, predestined people should be able to understand at a glance. ①, ②, ③, ④, ⑤⑥ roughly considers each part of the function to make an HTTP request (which is the same as when you open a web page with a browser). The difference is that the parameters are different.

[Program running effect – GIF]

In the application, all the HTTP response results have been printed on the page. A fierce operation such as tiger, do not know the true principle, we need to explain step by step.

① The server requests a pT_LOCAL_token

$token = getToken();Copy the code

GetToken to

Target page: https://xui.ptlogin2.qq.com/cgi-bin/xlogin?appid=715030901&daid=73&hide_close_icon=1&pt_no_auth=1&s_url=https%3A%2F %2Fqun.qq.com%2Fmember.html%23Copy the code

Initiate the request, in which parameters s_url=https://qun.qq.com/member.html#, tell the server if this page, please go to s_url after a successful login. The server thinks, “Who is this stranger? Give it a pT_LOCAL_token comment, and then tell the visitor (client) by returning a bunch of set-cookies in the response header that you saved them, including the pt_LOCAL_token.” I want to pass this also need uin, clientkey(=, = strangers how to know these, of course, Fiddler is looking for, standing in the overall situation to be able to look at the inner).

GetToken Response header HTTP/1.1 200 OKDate: Sun, 15 Apr 2018 12:50:46 gmtContent-Type: text/htmlContent-Length: 34759Connection: Keep-aliveserver: qzhttp-2.38.41 P3P: CP="CAO PSA OUR" cache-control: max-age= 86400set-cookie: pt_user_id=5197250212595915679; EXPIRES=Wed, 12-Apr-2028 12:50:47 GMT; PATH=/; DOMAIN=ui.ptlogin2.qq.com; Set-Cookie: pt_login_sig=2vLVdRAlGxcNvBYEjB5E*JjIE0u0-n21s0ouAQOeQ*bgo7Fkd6Cw3O9DrNS9l7C-; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: pt_clientip=1be91b13d8b8cf14; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: pt_serverip=39490abf0e2ff9a8; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: pt_local_token=1798081340; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: uikey=b9f012f44bdf628d965a537cd1049fc75f538501135f7f2356b48c0a1e0e8be3; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: pt_guid_sig=3239abfc37c8eea7b67a560e64664e9929011985be910feb6c9836bcac5c177a; EXPIRES=Tue, 15-May-2018 12:50:47 GMT; PATH=/; DOMAIN=ptlogin2.qq.com; Set-Cookie: ptui_identifier=000D9533813A7A9BEAE8DDB4A01B1C9FA96BB5F524F1F52858C67059; PATH=/; DOMAIN=ui.ptlogin2.qq.com; Last-Modified: Thu, 08 Mar 2012 02:04:00 GMTStrict-Transport-Security: max-age=31536000Copy the code

Did you find set-cookie: pT_LOCAL_token above?

② Ask the chick for information about its owner

$uin = pt_get_uins($token);Copy the code

This statement is directed to

https://localhost.ptlogin2.qq.com:4301/pt_get_uins?callback=ptui_getuins_CB&r=0.0760575656488639&pt_local_tk= {$token}Copy the code

This request was sent to the desktop QQ application (server listening port 4301), little penguin, the master let me go to Qun.qq.com, I was stopped by the damn server, treat me as a stranger. I need the host’s QQ number and other basic information, which is the token given to me by the server.

Little penguin (check carefully) : token to me, I confirm the login number. Ok, that’s all taken care of. The penguin left a bunch of strings, in which the response body contains the basic information of the current QQ number.

Pt_get_uins response header HTTP / 1.1 200 OKContent - Type: Application/javascriptContent - Length: 198pT_get_uins Response body VAR Var_sso_uin_list = [{" account ", "2919386060", "client_type:" 65793, "face_index:" 735, "gender" : 1, "nickname" : "millions of arm", "uin" : "2919 386060","uin_flag":58720768}]; ptui_getuins_CB(var_sso_uin_lCopy the code

The response body is a string, and the basic QQ information in JSON format is assigned. PHP uses the re to obtain the word parameters.

preg_match("/var_sso_uin_list=(.*?) ; ptui_getuins_CB/", $body, $matches); $json =$matches[1]; $jsonObj = json_decode($json, false); $user = $jsonObj[0]; $uin = $user->uin;Copy the code

Now you can obtain the token and the QQ id of the current desktop login through steps ① and ②. In fact, what we are doing is equivalent to accessing the address of ① directly in the browser. This situation applies to any place that requires QQ login. You can see that the interface is just refreshed, but there are so many requests behind it. Got here so far, Next.

[after ①② actual effect – GIF]

③ Ask clientkey again

$clientkey = pt_get_st($uin,$token);Copy the code

to

https://localhost.ptlogin2.qq.com:4301/pt_get_st?clientuin= {$uin} & callback = ptui_getst_CB & r = 0.4266647630782271 & pt_local_t k={$token}Copy the code

The request is still sent to the desktop QQ application. Note that the interface path is different. Little penguin brother, with token and UIN, I still can’t prove to the server that my access is authorized by the host. I still need you to agree key rules with the server, and then generate a clientkey according to token and UIN. Just tell me in secret.

Chick: Ok, ok, token+ Uin, go to clientkey now. Here you go. Set-cookie :clientkey. Remember that for yourself

Pt_get_st response header HTTP / 1.1 200 OKContent - Type: Application/javascriptContent - Length: 76 set - cookies: clientuin = 2919386060; path=/; domain=.ptlogin2.qq.comSet-Cookie: clientkey=00015AD353D5006867DD30727AD69245B5A090104A9B93A6D807D31DF79CE0E27049042682F8A107BD2E2BD40E5CCA6234F4C81FC1376F 1FAD7CBC6209A45DE359150100E066229559C8058877F25827ECC4F43DB486DCDED8C7679C47944A6FA363E6B9612F1FA65812D7DEE8C86013; path=/; Domain=.ptlogin2.qq.com P3P: CP="CAO PSA OUR"pt_get_st response body var var_sso_get_st_uin={uin:"2919386060"}; ptui_getst_CB(var_sso_get_st_uin);Copy the code

④ The server can pass this time

$arr = jump($token,$uin,$clientkey); $skey = $arr['skey']; $richURL = $arr['url'];Copy the code

to

https://ssl.ptlogin2.qq.com/jump?clientuin={$uin}&keyindex=9&pt_aid=715030901&daid=73&u1=https%3A%2F%2Fqun.qq.com%2Fmemb er.html%23&pt_local_tk={$token}&pt_3rd_aid=0&ptopt=1&style=40Copy the code

Issued a request, with toekn, UIN, clientkey that is one of the people, the server is very enthusiastic to allow through, and forced set-cookie plug a dozen items. Make sure you have the skey. This is your next document.

Jump response header HTTP / 1.1 200 OKDate: Sun, 15 Apr 2018 14:36:44 GMTContent - Type: application/javascriptContent - Length: 453Connection: keep-aliveCache-Control: no-cache, no-store, must-revalidateExpires: -1P3P: CP=CAO PSA OURPragma: No-cacheserver: Tencent Login Server/2.0.0 strict-transport-Security: max-age= 31536000set-cookie:... ; Set-Cookie: skey=@wkzS7Ao3s; Path=/; Domain=qq.com; Set-Cookie: ... ; Jump body ptui_qlogin_CB('0', 'https://ptlogin2.qun.qq.com/check_sig?pttype=2&uin=2919386060&service=jump&nodirect=0&ptsigx=103d55c16e79adbdc924a24a72 13a596273f36caa9564bd5466a4a4f6d3335aa84a27b285624906610d659180e0eaf850c622541f67b286babab32424274f7fb&s_url=https%3A%2F % 2 fqun.qq.com % 2 fmember. Html&f _url = & ptlang = 2052 & ptredirect = 100 & aid = 1000101 & 73 & j_later daid = = 0 & low_login_hour = 0 ® master = 0 & pt _login_type=2&pt_aid=715030901&pt_aaid=0&pt_light=0&pt_3rd_aid=0', '')Copy the code

Other set-cookies are omitted in the response header. Skey means the login authorization key in the QQ.com domain. The 0 in the response body indicates that the login is successful, and the long URL behind is the address to be jumped with the authentication identification code and other parameters. The technical term is called fat URL(named richURL later). Login failure response body is returned

Ptui_qlogin_CB (' 1 ', 'https://qun.qq.com/member.html', 'login failed, please try again later. * ')Copy the code

Imagine “Imperial City” has two levels, now we pass the first level. Obtained the QQ domain login authorization skey, the second level is the QQ sub-application login authorization, in the QQ group management page, also need a P_skey.

⑤ Find server to obtain p_skey

$p_skey =getPSkey($richURL,$skey);Copy the code

Make a request to the URL obtained in fourth, return

HTTP/1.1 302 FoundDate: Mon, 16 Apr 2018 01:54:23 gmtContent-Length: 0Connection: keep-Alivecache -Control: HTTP/1.1 302 FoundDate: Mon, 16 Apr 2018 01:54:23 gmtContent-Length: 0Connection: keep-Alivecache -Control: no-cache, no-store, must-revalidateExpires: -1Location: https://qun.qq.com/member.htmlP3P: CP=CAO PSA OURPragma: No-cacheserver: Tencent Login Server/2.0.0 strict-transport-Security: max-age= 31536000set-cookie:... ; Set-Cookie: p_skey=2GbDJky6IQHGAaks4oNWU5D*uWaDNzoubXh9-hBRC8A_; Path=/; Domain=qun.qq.com; Set-Cookie: ... ;Copy the code

The server tells us set-cookie :p_skey and retrieves p_skey from it with the re

$rule = "/p_skey=(.*?) ; / "; preg_match($rule, $header, $matches); $p_skey=$matches[1];Copy the code

In this way, we have obtained the authorization of QQ group management page. Now we have got all the identification keys. Guess where our program went and arrived at the QQ group management page successfully. Next, call the interface with key to get the desired data.

⑥ Data interface is used freely

$bkn = getBkn($skey); $Cookie = "Cookie: p_skey={$p_skey}; uin=o{$uin} ; skey={$skey}"; $json = getGroupList($bkn,$Cookie);Copy the code

Function getGroupList to

https://qun.qq.com/cgi-bin/qun_mgr/get_group_listCopy the code

Send a request, return the content of the current QQ group list of data. Wait, in addition to the Cookie, a BKN parameter (10 digits) is also issued. The server disconnects after telling Skey and P_skey, so BKN must be calculated by the client. According to some reference materials, it is found that it is encrypted by hash algorithm on the client, and the input parameter is Skey

//JavaScriptQZONE.FrontPage.getACSRFToken = function () { var skey = QZFL.cookie.get("p_skey") || QZFL.cookie.get("skey") || QZFL.cookie.get("rv2"); return arguments.callee._DJB(skey) }; QZONE.FrontPage.getACSRFToken._DJB = function (skey) { var hash = 5381; for (var i = 0, len = skey.length; i < len; ++i) hash += (hash << 5) + skey.charAt(i).charCodeAt(); return hash & 2147483647 }; / / / 2013.09.13 description: QQ website the latest edition of the encryption algorithm, _tk, BKN algorithm / / reference source: https://blog.csdn.net/default7/article/details/11632239Copy the code

The BKN algorithm adds the Unicode encoding of each character in the Skey and the hash value to the left by 5 bits. Finally, the HASH value is combined with 2147483647 to obtain BKN. Js version can not directly use, here with PHP rewrite

<? phpfunction charCodeAt($str, $index){ $char = mb_substr($str, $index, 1, 'UTF-8'); $value = null; if (mb_check_encoding($char, 'UTF-8')){ $ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8'); $value = hexdec(bin2hex($ret)); } return $value; }function getBkn($skey) { $hash = 5381; for ($i = 0, $len = strlen($skey); $i < $len; ++$i){ $hash +=($hash <<5) + charCodeAt($skey, $i); } return $hash & 2147483647; }Copy the code

So get the QQ group list interface get_group_list, as long as there is skey can calculate the BKN parameter, and then combined with p_skey, it is done. The same is true for search_group_members, the interface that gets detailed membership information for each group.

So we’re done here.

ⅳ. Review and summary

The above 6 steps can be summarized as follows:

Access → Obtaining authorization 1→ Obtaining authorization 2→ Obtaining data

Obtain the server token(pt_LOCAL_token) by visiting any page that requires quick QQ login, and get the clientkey together with the UIN of the current login QQ number. Clientkey is obtained before logging in after clicking the profile picture. Successful login to QQ through token, UIN and Clientkey to obtain domain authorization, and login authorization p_skey of sub-application can be obtained through the URL and Skey after authorization. This could have called any open interface, but a BKN (base_key) parameter is also required in the QQ group management page, which we found was generated by passing the Skey parameter through the bit operation. Finally, we get the data we want.

In this tortuous process, many parameters are repeated. Personally understood as a progressive enhancement of encryption, for security reasons, to verify the identity of the visitor. It can be imagined as a “imperial city” with two layers of internal and external protection levels, which is like a “back” shape, the first layer of authorization to enter the city, the sub-application interface in the second layer, the final call needs to run through the city’s authorization. Even if you get the inner p_Skey, you still can’t. Make sure you get through the outer Skey and pass the parameter BKN challenge. How does that work in practice? It’s always a bit of a cliffhanger, because all the work we do is still based on black box testing, nine times out of ten.

ⅴ, download the source code

https://github.com/nasaplayer/getCurrentQQGroupListCopy the code

My first github project

ⅵ. Reference Materials

[2017.3.21] the analysis QQ login agreement quickly And implementation of "CSRF" https://www.52pojie.cn/thread-591949-1-1.html [2017.5.12] "(VC) QQ automatic login _QQ group of validating authorization _ the source code "Https://www.52pojie.cn/forum.php?mod=viewthread&tid=607525&page=1Copy the code

While writing this, I read two books: Illustrated HTTP - page 250 in about 2 hours and quickly understood the basics of HTTP. I think I got 60% of them, which increased my understanding. The Definitive Guide to HTTP -694 pages worth full marks. Chapter 11 of this book - Client Recognition and Cookie mechanism, talking about the cookie authorization of Amazon shopping website, is very helpful to this paper. So far, I have only finished reading chapter 1 and chapter 11, and the whole book is 21 chapters. The knowledge of the Internet is too fragmentary, there are books or practical pointCopy the code

ⅶ. Expand the data

The set-cookie related Settings of the server response at each address in the login process are drawn, and the number is exactly corresponding to the program number above. The green background indicates the domain name where the Cookie is located, and the red highlight indicates the core parameters. It is highly likely that other QQ sub-apps work similarly.

ⅷ. Put the data into Excel

For engineers, seeing the data display is equivalent to entering EXCEL. The latest wheel for creating EXCEL files directly from PHP is PHPSpreadsheets, which the author mentioned on Github as the next version of phpEXCEL. We don’t update the old one, so why not try the latest PHP features? Quite right, PHP developers need to evolve with The Times

https://github.com/PHPOffice/PhpSpreadsheetCopy the code

To the end.