Solemnly declare: this article is only for learning to use, prohibited for illegal use, otherwise the consequences, such as infringement, please inform to delete, thank you for cooperation!
The opening be improved
In this paper, aiming at the invalidity of requests often encountered in the process of grabbing tickets by self-developed scripts, we simply analyzed the front-end encryption algorithm of 12306 website, and more precisely, explored the generation process of RAIL_DEVICEID.
Because this cookie value is the core basis of a ticket grab request, requests cannot be sent correctly without it, or they will expire after a period of time and need to be retrieved, or access is restricted even though the browser navigator. UserAgent identity has been changed…
Because it is not a real client identifier, just a deceptive tactic, the browser’s unique identifier is actually RAIL_OkLJUJ and it was deliberately not added to the cookie by 12306.cn’s designers, so it is very deceptive, programming is really an art!
You think your crawler can mimic the browser properly, but if you don’t know who the real browser logo is, no amount of disguise can be fooled.
The figure above shows where RAIL_OkLJUJ exists, either to be compatible with most browsers on the market, or to combine various front-end caching technologies as signatures. RAIL_OkLJUJ always exists in Local Storage in addition to cookies. Session Storage, IndexedDB and Web SQL.
Note that RAIL_OkLJUJ is intentionally not set in the cookie. If you refresh the page again after clearing all the cache, you will see that RAIL_DEVICEID has changed and RAIL_OkLJUJ remains the same!
Here’s a simple verification to show who is the true browser unique identifier:
- Step 1: Copy the current fetch
RAIL_DEVICEID
和RAIL_OkLJUJ
The value of the
Open the Console and fetch the localStorage value with js code:
localStorage.getItem("RAIL_DEVICEID");
localStorage.getItem("RAIL_OkLJUJ");
Copy the code
The console returns the value immediately, and you need to manually copy it elsewhere to wait and compare it with the second result.
But programmers always like to lazy lazy, manual copy is too lazy to copy?
Of course, continue to use JS code copy ah!
copy('the snow dream technical post welcome your visit, https://snowdreams1006.cn');
Copy the code
Such as the code will be the text ‘snow dream technology post welcome your visit, https://snowdreams1006.cn’ is copied to the clipboard, then right click to select text editor paste can see the effect!
So tweak the code to replicate the RAIL_DEVICEID and RAIL_OkLJUJ values from the first visit to 12306.com.
copy("RAIL_DEVICEID:::"+localStorage.getItem("RAIL_DEVICEID"));
// RAIL_DEVICEID:::E5BDkKrPkZ6nuZruqUj9-3lUG1LBM7t9aTDbZwFSdrboaFG6odrWZ9yuphnas4Jwq5E_FXIwwqlRoSXFbJULUiBNwNGt61Ow6Zv0GFXR ABipaeDJJ0Ub7G2g_B_aGwMF5DNZ5KJR4eWVl-P3zSHGKbczLB3WN0z-
copy("RAIL_OkLJUJ:::"+localStorage.getItem("RAIL_OkLJUJ"));
// RAIL_OkLJUJ:::FGFOJ75VdD8dQc2yh3yTJf2RBWES6uGI
Copy the code
- Step 2: Wait 5 minutes and obtain the file again
RAIL_DEVICEID
和RAIL_OkLJUJ
The value of the
copy("RAIL_DEVICEID:::"+localStorage.getItem("RAIL_DEVICEID"));
// RAIL_DEVICEID:::VUye37EEUdGHgrpJGo9J95hWMNSIUFPeYBjabDgCiYJbQIr53iVzIPQJwcLhbijL4OyPVGmzolsVEK8Pw7_DG_oPrUDpfbnRe7HvMWMJ vU2MAbk-7EwNEePAlpnVb9QVZz4dtOUSCRVbS2zlwgS0xe2BOThpR9oy
copy("RAIL_OkLJUJ:::"+localStorage.getItem("RAIL_OkLJUJ"));
// RAIL_OkLJUJ:::FGFOJ75VdD8dQc2yh3yTJf2RBWES6uGI
Copy the code
Or clean the cookies and refresh the current page again, or somehow trigger the browser to run the logic again to regenerate RAIL_DEVICEID and RAIL_OkLJUJ.
- Step 3: Compare the results obtained at the first and second time
RAIL_DEVICEID
和RAIL_OkLJUJ
The value of the
RAIL_DEVICEID:::E5BDkKrPkZ6nuZruqUj9-3lUG1LBM7t9aTDbZwFSdrboaFG6odrWZ9yuphnas4Jwq5E_FXIwwqlRoSXFbJULUiBNwNGt61Ow6Zv0GFXR ABipaeDJJ0Ub7G2g_B_aGwMF5DNZ5KJR4eWVl-P3zSHGKbczLB3WN0z- RAIL_OkLJUJ:::FGFOJ75VdD8dQc2yh3yTJf2RBWES6uGI RAIL_DEVICEID:::VUye37EEUdGHgrpJGo9J95hWMNSIUFPeYBjabDgCiYJbQIr53iVzIPQJwcLhbijL4OyPVGmzolsVEK8Pw7_DG_oPrUDpfbnRe7HvMWMJ vU2MAbk-7EwNEePAlpnVb9QVZz4dtOUSCRVbS2zlwgS0xe2BOThpR9oy RAIL_OkLJUJ:::FGFOJ75VdD8dQc2yh3yTJf2RBWES6uGICopy the code
It’s obvious to the naked eye that RAIL_OkLJUJ doesn’t change between requests and RAIL_DEVICEID probably does.
Therefore,RAIL_DEVICEID should not be the browser unique identifier, but RAIL_OkLJUJ is!
This article is not for everyone. If you are one of the following, this article will definitely help you. If you are not, it will be a waste of your life.
- Suitable for the independent grab tickets or script grab tickets have a demand for tianya wanderer
- Suitable for developers with some knowledge of web front-end development
- For lonely people who can tolerate loneliness and study encryption algorithms alone
Finally the core premise is to have a network, of course WiFi is better, otherwise the flow is really unbearable ah!
The story background
Alone in a strange land for a stranger every festival to grab tickets manual automatic together often drop out of touch heart good hurt hands-on practice out of truth the original identity is the only want to seal you no discussion can only start to disguise encryption request in the front and back end return control restore algorithm to change identity stability grab tickets don't worry about a variety of ways to get on the train tickets quickly appearCopy the code
Do not know whether you have encountered a difficult to get a ticket dilemma, although the network about the third-party tools to accelerate the package has been refuted rumors, but every holiday will always meet the problem of tickets, most people will choose to buy a psychological comfort!
So far,12306 official online ticketing channels only include 12306 website and mobile app client, so popular third-party ticketing software are abnormal ways, and the simplest way to achieve these third-party channels should be crawler technology.
Whether it is the web or the mobile end, all called the client, the role of the client is only a mouthpiece, really responsible for the execution of the command is the server.
When you submit a ticket request, the client will send the ticket information to the server as a package. If the server has a ticket, it may return a success message to the client, congratulations on your successful booking.
But although there are tickets may not be given to you, the only certainty is that no ticket will fail, in short, no matter what the result is, the server and the client are always in accordance with the established agreement in silent communication…..
Even though official channels are the most reliable and accurate, they still haven’t been able to get you a ticket.
Therefore, if you want to grab tickets, you still have to do it yourself, not completely rely on the official. Here, crawler technology was born to pretend to be a client. If you want to deceive the server successfully, you must first understand the characteristics of the real client.
Due to the limited space of this article, temporarily do not do about grabbing tickets related discussions, go straight to the focus, explain RAIL_DEVICEID request process, take you step by step restore 12306 website front-end encryption algorithm implementation logic!
Results the preview
Run in the browser console chromeHelper. Prototype. EncryptedFingerPrintInfo () method when calculating real browser information, if it is found that calculation results in the value value and real request Kyfw. 12306. Cn/otns/HttpZF /… The hashcode value of 12306 is the same, then congratulations to you, indicating that the relevant algorithm has not been updated, if not the same estimation algorithm has been slightly adjusted!
Fact proves, although 12306 algorithm is changing but it is petty petty trouble, did not hurt at all, so oneself begin to change can full blood resurrect yo!
{
"key": "&FMQw=0&q4f3=zh-CN&VPIf=1&custID=133&VEek=unknown&dzuS=0&yD16=0&EOQP=c227b88b01f5c513710d4b9f16a5ce52&jp76=52d67b2a5aa5 e031084733d5006cc664&hAqN=MacIntel&platform=WEB&ks0Q=d22ca0b81584fbea62237b14bd04c866&TeRS=777x1280&tOHY=24xx800x1280&Fv Je = i1l1o1s1 & q5aJ = 8 & wNLf = 99115 dfb07133750ba677d055874de87 & aew = 0 Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome / 80.0.3987.87 Safari / 537.36 & E3gR = 9 f7fa43e794048f6193187756181b3b9"."value": "owRJc8M4EkFMvcTkzibRFJoDSkUKCx6N9ictZIJLIeY"
}
Copy the code
- Step 1: Use Chrome to open 12306 and clear all cached data.
Ensure that you are using Google Chrome. Internet Explorer and Firefox have not been tested.
- Step 2: Manually clear
window.name
Property to ensure that the browser is in the state of opening 12306 website for the first time.
Because non-first loading will carry the last request information, it is not convenient to learn and verify. After analysis and test, it is found that the historical state is still stored in the name attribute of the Window object. Therefore, it is not enough to just clear the cache, but also need to manually clear the value of the name attribute.
- Step 3: Refresh the current page forcibly, keep the request information, and filter the request type
js
To find/otn/HttpZF/logdeviceThe request.
Found in the request to save query parameter called hashCode: owRJc8M4EkFMvcTkzibRFJoDSkUKCx6N9ictZIJLIeY, convenient and after comparing the results of calculation is generated.
In addition to query request information, more important is to check the response information, the original request/OTN /HttpZF/ logDevice in addition to the expiration time exp and DFP device information, but also return cookieCode device unique identifier.
If the logic in the /otn/HttpZF/GetJS script initiates another/OTn /HttpZF/ logDevice request after the expiration date or after the site cache is manually cleared, the response will no longer have the cookieCode parameter.
Let’s take a closer look at the response information for the initial request.
callbackFunction('{"exp":"1581948102442","cookieCode":"FGHcXsVmjf3oV0zm5qTDPFt-VcNhuDA-","dfp":"QNCYH1J5E9M7rl97uo_PUR1O SwRTcCe1xdnbX7h2V6Ewcq6kML0qzXD5y11rLv3FPX1ndOnhL_bjVkwwgtWTsHMFums60_4H9Lr-vJzJGq4tkaUEGfRNXN9IJlvptReSBa5PP7N5gxpSOBo- YlF5Ac98f-YlNlxi"}')Copy the code
If you remove the callbackFunction() callback, you will find that the data returned is in JSON format. After formatting, you will find the following response:
{
"exp": "1581948102442"."cookieCode": "FGHcXsVmjf3oV0zm5qTDPFt-VcNhuDA-"."dfp": "QNCYH1J5E9M7rl97uo_PUR1OSwRTcCe1xdnbX7h2V6Ewcq6kML0qzXD5y11rLv3FPX1ndOnhL_bjVkwwgtWTsHMFums60_4H9Lr-vJzJGq4tkaUEGfRNXN9 IJlvptReSBa5PP7N5gxpSOBo-YlF5Ac98f-YlNlxi"
}
Copy the code
If you think you’ve updated the RAIL_DEVICEID value, the cookieCode value is the unique identifier and it’s not set to the cookie, it’s just kept as a local cache for further requests RAIL_DEVICEID.
- Step 4: Copy the source code implementation to the console and enter
chromeHelper.prototype.encryptedFingerPrintInfo()
Get request/otn/HttpZF/logdeviceTo extract the query parametersvalue
Value compared to the actual request parameters.
Assumes that the real request parameter hashcode values have been set to variables, chromeHelper. Prototype. EncryptedFingerPrintInfo () value = = = true hashcode return results It indicates that the implementation of the recurrence algorithm is still running normally, otherwise it is likely that the relevant algorithm is updated again!
To focus on
If you’re learning about automatic ticket-snatching or planning to study how to do it, I can safely tell you that RAIL_DEVICEID is the ultimate anti-crawler in 12306!
Now that the target is locked, go ahead and explore with me how 12306 handles RAIL_DEVICEID.
Access the site in traceless mode
As we all know, Google Chrome is the only browser for programmers, because it provides powerful development and debugging capabilities, simple website requests even without the help of third-party professional package capture tools can be independently completed analysis of the whole process.
If you haven’t heard of Chrome or are using another browser, you are advised to download the latest version of Chrome first. Using the same tools as this article will help you to reproduce the steps smoothly, otherwise you will have to research the strange problems yourself.
First, open the traceless mode of Chrome browser. The biggest feature of the traceless mode is that it does not save cookies. To a certain extent, it is new users (mainly refers to new client terminals) for the target website.
After entering 12306.cn, open the Developer console (F12 or right-click to check), select the Network TAB, and ensure that you are always listening for and logging your network requests.
Specifically, the left-most listening state is centered in red, and the Preserve Log check box and the Disable cache check box are the basis for analyzing all network requests.
After the preparation work is ready, start a complete ticket purchase process, that is, enter the login page from the home page, log in and buy tickets, etc. The more complete the request steps are, the more data can be provided for analysis, and the more important steps will not be omitted, and the closer to the truth.
It is recommended that you deliberately output incorrect account and password information when logging in to the server. This will cause too many redirects after a successful login and fail to find the previous login request. If the Preserve log function in the Network TAB is not enabled, this phenomenon will be more serious!
Enter the correct login information again after successful login to buy tickets and other operations, but there is no need to pay, as long as the normal operation until the completion of the order can be regarded as the whole ticket purchase process.
After placing the order successfully, the whole ticket purchase process has been basically completed. Next, start global search keyword RAIL_DEVICEID to check where is generated and where is used?
Global fuzzy search keyword
Now that the entire ticket purchase process is basically complete, let’s start searching globally for RAIL_DEVICEID in all requests.
First open the Network TAB, the fourth magnifying glass icon from left to right is the search function, enter the search keyword RAIL_DEVICEID will filter the network requests that meet the criteria.
It doesn’t matter if you don’t search, one by one, you can only see that most of the network requests will automatically carry the cookie, but submerged in the end which network request generates the cookie?
So you have to find a way to search accurately and filter the network requests that are born with that cookie, so the next question becomes what would a network request look like if RAIL_DEVICEID were a behavior set directly from the back end?
The best way to learn is to imitate, assuming we don’t know what the actual setting process is, but we can look at the setting process of other cookies.
Similarly, in the network (network) TAB to select the third filter funnel icon, a network request type, roughly divided into All | XHR | JS | | CSS Img | Media | | of the Font | Doc WS | Manifest | Other types, etc.
Let’s briefly talk about the related meanings of network request types and sort out the table for a direct feeling:
type | The name of the | describe | code |
---|---|---|---|
XHR | XHR adn Fetch | Ajax asynchronous request | X-Requested-With: XMLHttpRequest |
JS | Scripts | Js script | Sec-Fetch-Dest: script |
CSS | Stylesheets | CSS styles | Sec-Fetch-Dest: style |
Img | Images | The picture | Sec-Fetch-Dest: image |
Media | Media | Audio and video media | Sec-Fetch-Dest: audio |
Font | Fonts | The font | Sec-Fetch-Dest: font |
Doc | Documents | An HTML document | Sec-Fetch-Dest: document |
WS | WebSockets | Long link communication | no |
Manifest | Manifest | Version of the file | no |
Since 12306 does not include the latter two request types, it is impossible to determine the characteristics of the request. Except for ajax asynchronous requests, other types of network requests are identified by the sec-fetch -Dest attribute of the request header. Of course, the cookie of the browser is set This is no exception, except that most Settings are done on the server side, where the response header of the network request indicates how the cookie is set to behave.
In the process of front-end Web development, the front and back ends are not separated from each other at the beginning. For a long time, the front end page is also completed by the back end personnel, so many websites still retain the traces of old and new alternate.
Among the above network request types,XHR and Doc request are the most representative of this change. One of the common encapsulation implementations of XHR is the popular ajax asynchronous request, which is used to achieve local updating of web content without refreshing, while Doc is the document type, whether directly output native HTML Or the use of template technology dynamic rendering pages, the final output to show the results will be HTML documents, this kind of network request is the easiest to set cookies and other requests, reflecting the consistent style of the previous generation of technology, I wish a person once all the work is done!
However, with the development and progress of technology, more and more problems of old technology have been exposed, attracting the attention of the industry, including developers. Major enterprises have gradually begun to change, that is, each plays its own role and makes the best use of everything.
In summary,XHR and Doc have the following characteristics:
- The vast majority of the data requested by XHR is completed in the front end. The back end returns the relevant data to the front end interface caller, and the front end takes the data for business assembly and presentation.
- XHR requests are mostly asynchronous Ajax requests, with the advantage that the current page doesn’t need to be refreshed to see the latest content, and the disadvantage of a request-waiting nightmare when it comes to interdependent business.
- XHR requests are made from the front end, where the browser sends them to the back end, and the back end server returns the data, which is said to be a very small but important number of requests.
- Doc requests are mostly controlled by the back end, which makes it easy to set up the presentation of various page elements, but it does not rule out the use of the front end’s related template engine in combination with XHR data to control the generation of documents.
- Doc requests set up a series of network behaviors including cookies. Forwarding and redirection are common practices for permission control.
Below we have included the RAIL_DEVICEID keyword for network requests to briefly appreciate the differences between the two.
XHR requests focus on how to Request and receive Data, which is mainly reflected in Request Data and Response Data. Request headers are generally set by default.
Doc request focus is different, the vast majority of the request url is entered automatically redirect users, and therefore the focus should be on the request and response headers, because the value of the cookie is sent to the backend server through the request header, the backend if you want to add or modify the cookie value is set up through the response header.
Now it is impossible to determine whether RAIL_DEVICEID is directly set by the server or the client itself, and the client behavior is not intuitive, so relatively speaking, or first knead the server soft persimmon bar!
Looking back at the Doc network request, you can see that the behavior code for setting cookies looks something like this:
Set-Cookie: JSESSIONID=D4CE095F5A21B38DF3389070F1E01FE6; Path=/otn
Copy the code
Now that you’ve found the learning object, start miming the keyword for finding similar requests: set-cookie: RAIL_DEVICEID=
No results!
In general, the absence of results is likely due to one of the following reasons:
- Congratulations, really check no results, can change the way of thinking to continue to explore.
- Unfortunately, the current network request data is insufficient to meet the requirements of the request.
- Unfortunately, the operation is wrong to enter redundant symbols or this is the keyword search actually turned on the regular matching
Therefore, to solve the above problems one by one, first consider changing the keyword set-cookie: JSESSIONID can find the corresponding result.
Facts speak louder than words, there is nothing wrong with the query process but the query result is really not there, so if you check both causes at once, there is a good chance that the generation logic is on the front end rather than the back end.
The next step is to take a chance on where the front-end generates RAIL_DEVICEID in many requests, but many requests require a little more strategic direction.
Since it is the front-end that controls the cookie generation logic, it is likely that a JS file is at work, and of course it does not rule out the ability of other types of files to manipulate browser behavior.
Therefore, analysis further reduces the scope: look for all requests including the keyword RAIL_DEVICEID in a network request of request type JS.
Ideal is full, the reality is too skinny, this thought selected JS search keyword can take effect together, the results are not, the request type is still not filtered out, or a large number of requests!
At the same time, if you select any request randomly, you can see that the network TAB search matches the large result. In fact, the basic information is the request header, which should not include JS or doc source code. If the direction is wrong, it is a waste of time to go many ways.
I don’t know where you came from or what you’ve been through, but I know you’ll end up on the network file system.
Chrome can see not only the web request but also the final file system presented to the user. Since I can’t find you in the middle process, I will go straight to the destination to search!
Open the Source TAB, the whole panel is roughly divided into three parts, the left file number, the middle file area, the right debugging area.
The file structure on the left can clearly see the current hierarchical structure, which is conducive to quickly grasp the outline of the project. The debug area on the right provides a great validation tool for people who have ideas in mind but are not sure if they are right. Debugging someone else’s code is like predicting and verifying native development.
The middle of the file area is the largest, the function of nature can not be too weak, select the left side of the specific file can be displayed source code, convenient to view, and then to debug verification heart.
So the question is, where do you search for files that include the keyword RAIL_DEVICEID?
Generally speaking, good user experience is not necessary to tell you the user manual, give you a lot of detailed documentation may not bear to read one, first use again!
Life is too short to waste too much time on boring things. A short three-line reminder is philosophical. The first line tells you how to open a file, the second line tells you how to run a command, and the third line tells you what to do.
If three lines of code isn’t enough to solve your problem, read more and figure out the documentation yourself.
Of course, the closest thing here to finding the file system that contains the keyword RAIL_DEVICEID is line 2. Run the command, so try it!
After entering search search, the relevant search command popped up, so after clicking DevTools, the search box appeared. Naturally, enter the keyword RAIL_DEVICEID to search!
Finally waiting for you, fortunately I did not give up, you are my only, looks and RAIL_DEVICEID related processing logic all in such a JS file, see where you still run!
Where else are we going to run
Find the file and click to view, red, blue and black dense large js Code, absolutely not to see but to see the machine, want to give a person to read also need to beautify, the source code ugly confusion into difficult to read the code is to prevent others peeping copy copy of their own work, but also to reduce file size, speed up network transmission data, let your website faster.
Click on the formatting icon in the lower left corner of the middle area to beautify the code, and then search the file for RAIL_DEVICEID to locate the specific code.
Very humanized is that the search function is a universal shortcut Ctrl + F, now to the specific code, screenshots, the next is the real test of technology moment!
$a.getJSON("https://kyfw.12306.cn/otn/HttpZF/logdevice" + ("? algID\x3drblubbXDx3\x26hashCode\x3d" + e + a), null.function(a) {
var b = JSON.parse(a);
void 0! = lb && lb.postMessage(a, r.parent);for (var d in b)
"dfp" == d ? F("RAIL_DEVICEID") != b[d] && (W("RAIL_DEVICEID", b[d], 1E3),
c.deviceEc.set("RAIL_DEVICEID", b[d])) : "exp" == d ? W("RAIL_EXPIRATION", b[d], 1E3) : "cookieCode" == d && (c.ec.set("RAIL_OkLJUJ", b[d]),
W("RAIL_OkLJUJ"."".0))})Copy the code
Local backup JS is easy to reproduce
Now that I have found the key files, I naturally need to save snapshots for archiving operations. Otherwise, one day when the files are updated, I do not know where the changes have occurred, so I have to analyze again from the beginning. I choose differential update rather than full coverage!
Right-click the source file to pop up the menu, choose any favorite way to copy the source file to the local for learning backup, ready to do a lot of work.
To be continued
Due to the limited space, a blog had to be divided into three parts, this caused by the bad experience, please forgive me, if you need to read the rest of the snow dream technology station not lost.
If you feel that this article is helpful to you, welcome to like the message to tell me, your encouragement is my motivation to continue to create, might as well pay attention to the personal public number “snow dream technology station”, regularly update quality articles!