Article author: “NightTeam” – Dai Huangjin
Polish, proofread: “NightTeam” – Loco
What is implicit Style-CSS
What is implicit Style CSS?
In CSS, ::before creates a pseudo-element that will be the first child of the element that matches the selection. The content attribute is often used to add modifiers to an element.
From: developer.mozilla.org/zh-CN/docs/…
The above paragraph for the friends who have not done front-end development, looking at it may be a little difficult to understand, it doesn’t matter, let’s use an example to briefly demonstrate.
Let’s create a new HTML file and enter something like this:
<q>Hello everyone, I am salty fish</q>.<q>I'm part of the NightTeam</q>
Copy the code
And reference the following style file in the HTML:
q::before {
content: "" ";
color: blue;
}
q::after {
content: "»";
color: red;
}
Copy the code
The final display in the browser looks like this:
As you can see in the above example, I hid the symbols before and after the text in the HTML source code, but after the browser rendering, the symbols before and after the text appears. Isn’t that amazing?
At present, many fiction websites use such anti-crawler technology to protect their content from crawlers.
Instance to explain
So how do you deal with anti-crawl techniques like this? Salted fish prepared a simple practical example, with examples to tell me how to deal with this kind of reverse crawl.
Send a message [implicit style-css] to our wechat official account [NightTeam] to get the sample address ~
Because this example is relatively simple, so HERE I will omit the analysis of the request step, directly to analyze the browser to see the effect and the source code respectively look like, look for any breakthrough.
Here’s what the browser looks like:
Here is the source code:
Page analysis
Open your browser in Developer mode and see what the hidden text looks like:
As you can see in the figure above, the content in the Styles column (at tag 2, meaning the element’s CSS information) is exactly the hidden content in the HTML (at tag 1).
This is in line with our example of “implicit style-CSS” in Part 1.
All we need to do is replace the span tag with the content set in the CSS.
For a normal page structure, CSS is placed directly in the HTML source code or in a.css file. Just click the file name on the right side of the Styles bar to jump directly to the location of the CSS document as shown below:
On this page, however, there is no such clickable position in the Styles column, which means that the CSS for this section is not directly in a file, but dynamically added by special means, so we have to analyze the pattern of the SPAN tag to find a breakthrough.
As you can see from the HTML source, all span tags are context_kw with a numeric concatenation. We can search for context_kw.
Context_kw JS: context_KW JS
Let’s take a quick look at the entire JS code. This section of JS is divided into two parts by function:
Part I: CryptoJS encryption and decryption of the logical content, can be ignored.
The second part: after the confused content, JS in the second part decrypts the ciphertext in the array, and operates the DOM, combines JS with CSS, and completes the main logic of anti-crawl.
JS code analysis
Based on the code that manipulated the DOM in Part 2, we found the key variable words.
for (var i = 0x0; i < words[_0xea12('0x18')]; i++) {
try {
document[_0xea12('0x2a'] [0x0][_0xea12('0x2b'(a)]'.context_kw' + i + _0xea12('0x2c'), 'content:\x20\x22' + words[i] + '\x22');
} catch (_0x527f83) {
document['styleSheets'] [0x0] ['insertRule'](_0xea12('0x2d') + i + _0xea12('0x2e') + words[i] + '\x22}'.document[_0xea12('0x2a'] [0x0][_0xea12('0x2f')][_0xea12('0x18')]); }}Copy the code
Continue to find the words variable declaration.
var secWords = decrypted[_0xea12('0x16')](CryptoJS['enc'] ['Utf8'])[_0xea12('0x17'(a)]', ');
var words = new Array(secWords[_0xea12('0x18')]);
Copy the code
In this way, we finally find that the content of CSS is an encrypted element in array _0xa12e after AES decryption and some processing.
With this logical framework in place, we can start to figure out what JS code we need.
JS code adjustment
Although this code is confused, it is still relatively simple, so the specific code picking steps will not be demonstrated. Here are two points that need to be rewritten after picking out the code.
The first is the exception catch in the figure below, which determines whether the current URL is the original website, but we execute in the Node environment without the window attribute, if not modified, there will be an exception, so we need to comment out the if statement here.
The second is the judgment statement in the following figure, which is also used to judge the attributes that do not exist in Node, so it needs to be modified accordingly.
The second change can be made like this:
_0x1532b6[_0xea12('0x26')](_0x490c80, 0x3* +! ('object' === _0xea12('0x27')))
Copy the code
After the above two changes, you can get all the replaced characters, and then you just need to replace them into THE HTML to restore the normal page. I’m not going to show you the substitution step here, because it’s so easy to see.
conclusion
This article briefly introduces the application of implicit style-CSS in anti-crawler, and through a simple example to learn how to deal with this situation, I believe that you have tried to clearly know how to crack the next time you encounter this kind of anti-crawler.
Of course, this example is not perfect enough to completely cover all the application ways of implicit style-CSS in anti-crawler. If readers are interested in this kind of anti-crawler, they may as well find several examples to try by themselves, and welcome to communicate with me through the comment area.
Send a message [implicit style-css] to our wechat official account [NightTeam] to get the sample address ~
Founded in 2019, the nightnight team includes Cui Qingcai, Zhou Ziqi, Chen Xiangan, Tang Yifei, Feng Wei, CAI Jin, Dai Huangjin, Zhang Yeqing and Wei Shidong.
My programming languages include but are not limited to Python, Rust, C++, Go, and the fields include crawler, deep learning, service development, object storage, etc. The team is neither good nor evil. We only do what we think is right. Please be careful.