This is the 21st day of my participation in Gwen Challenge

What is the SRI

Subresource Integrity (SRI) can be used to generate a unique hash value of a file, and then check the content of the file for you. If the static resource returned by the server is inconsistent with the generated hash, an error will be reported and the file will not be loaded. It is the same as the MD5 check for the zip package at the download site.

CDN hijacked

SRI, Subresource Integrity, refers to a security feature that browsers use to verify the Integrity of a resource (usually obtained from a CDN) to determine whether it has been tampered with. SRI can be enabled by adding integrity to the link tag or script tag, for example:

<script type="text/javascript" src="//s.url.cn/xxxx/xxx.js?_offline=1" integrity="sha256-mY9nzNMPPf8oL3CJss7THIEoXAC2ToW1tEX0NBhMvuw= sha384-ncIKElSEk2OR3YfjNLRSY35mzt0CUwrpNDVS//iD3dF9vxrWeZ7WPlAPJTqGkSai" crossorigin="anonymous"></script>
Copy the code

The integrity value is divided into two parts. The first part specifies the hash generation algorithm (SHA256, SHA384, and SHA512), and the second part is the actual hash encoded in Base64, separated by a short bar (-). The integrity value can contain multiple space-separated hashes, and as long as the file matches any one of the hashes, the resource can be verified and loaded. In the above example, I used sha256 and SHA384 hash schemes.

Remark: What crossorigin=”anonymous” does is introduce cross-domain scripting, and in HTML5 there’s a way to get cross-domain scripting error messages, First, the server for cross-domain scripting must Allow the current domain name to get error messages through the Access-Control-allow-Origin header, and then the script tag for the current domain name must declare cross-domain support, that is, the Crossorigin attribute. Tags such as link and IMG support cross-domain scripting. If the above two conditions are not met, a try catch scheme can be used.

Why use SRI

In Web development, the use of CDN resources can effectively reduce the network request time, but there is also a problem with the use of CDN resources, CDN resources exist in the third-party server, security is not completely controllable.

CDN hijacking is a very difficult problem to locate. First of all, the hijacker will use some algorithm or random way to hijacking (cunning big drop), so it is very difficult to reproduce, many users appear after refreshing the page no longer appear. A colleague of the company encountered this problem when making a game downloader. After downloading the game, the user decompressed it and could not play it. Later, the reason was found by comparing the files one by one, and it turned out to be caused by CDN hijacking. How did you solve it? I heard that the protection fee was paid to XX, which is also the way of using file hash. Presumably, the principle is the same as SRI. Fortunately, most of the CURRENT CDN hijacking is just for some entrap, such as the insertion of some patch ads through iframe. If the hijacker has ulterior motives, such as XSS injection and so on, it is very dangerous.

Enabling SRI can effectively ensure the integrity of page reference resources and avoid malicious code execution.

How do browsers handle SRI

When the browser encounters an integrity attribute in a script or link tag, it compares the hash value of the loaded file to the expected hash value before executing the script or applying the stylesheet. When the hash value of a script or stylesheet does not match the expected value, the browser must refuse to execute the script or apply the stylesheet, and must return a network error saying that obtaining the script or stylesheet failed.

The use of SRI

Script tags containing integrity attributes can be generated by using webpack’s html-webpack-plugin and webpack-subresource-integrity.

import SriPlugin from 'webpack-subresource-integrity';

const compiler = webpack({
    output: {
        crossOriginLoading: 'anonymous',},plugins: [
        new SriPlugin({
            hashFuncNames: ['sha256'.'sha384'].enabled: process.env.NODE_ENV === 'production',]}}));Copy the code

What to do when script or link resource SRI fails? A better way is to reload resources between static file servers when onerror is encountered via script onerror:

<script type="text/javascript" src="//11.url.cn/aaa.js"
        integrity="sha256-xxx sha384-yyy"
        crossorigin="anonymous" onerror="loadjs.call(this, event)"></script>

// loadjs:
function loadjs (event) {
  / / report.// Reload the js
  return new Promise(function (resolve, reject) {
    var script = document.createElement('script')
    script.src = this.src.replace(/\/\/11.src.cn/.'https://x.y.z') // Replace the CDN address with the static file server address
    script.onload = resolve
    script.onerror = reject
    document.getElementsByTagName('head') [0].appendChild(script); })}Copy the code

The disadvantage of this approach is that the event parameter in onError cannot distinguish the cause of the error, which may be the non-existence of resources or the failure of SRI verification. However, at present, unless there is a statistical requirement, there is no problem with non-discrimination. We also need to inject the onError event with script-ext-html-webpack-plugin:

const ScriptExtHtmlWebpackPlugin = require('script-ext-html-webpack-plugin');

module.exports = {
  / /...
  plugins: [
    new HtmlWebpackPlugin(),
    new SriPlugin({
      hashFuncNames: ['sha256'.'sha384']}),new ScriptExtHtmlWebpackPlugin({
      custom: {
        test: /.js*/,
        attribute: 'onerror="loadjs.call(this, event)" onsuccess="loadSuccess.call(this)"'}}})]Copy the code

The loadJS and loadSuccess methods are then injected into the HTML, using inline.

SRI solves CDN hijacking cases

CDN hijacking can also be solved by jSONP. The main reason why this method can deal with CDN hijacking perfectly at present is that operators carry out hijacking through file name matching, mainly through onerror detection and interception, and remove the JS suffix of resource files to deal with CDN hijacking.

There’s a webpack-subresource-integrity plugin out there, but it just generates hashes, you have to replace templates and so on, and it’s not a core issue. The question is how to use this feature against JS file hijacking!

All published JS files generate a srihash value. If our file is hijacked, the onError event callback of the script tag will be triggered (local multiple mobile browsers successfully tested), and then we will use this feature to do something in the onError callback.

1. Replaying the scene.

We will first monitor onError being triggered, and then fetch this script resource by ourselves again. Of course, we will not add SRI check this time in the fetch, so THAT I can get the real content of the file reporting error this time, and then we will report this JS content through a log interface.

2. Multiple comparisons.

When we fetch for the first time, we generally access the user’s local browser cache, but we actually want to check whether the remote CDN real file is changed. How do you penetrate the cache? It is very simple, we send the second fetch and add a timestamp this time to ensure that we get the JS file without cache. We compare locally, and the comparison method is very simple: diff of size.

3. Collect client information.

We can simply collect the error page information, JS URL and JS file content. Since we are the fetch request, we can actually obtain some Response headers, but there are some restrictions on the acquisition of response headers. Many headers cannot be obtained, so we can refer to the document. But if the custom header is set to origin cross-domain can be obtained, here needs to be set in the CDN source station, and then we can get the ERROR JS CDN node address, this is very key, used to cast the fault and troubleshoot problems, and even clear the CDN cache.

4. Use big data thinking

You can store the information that onError has retrieved, and then analyze which operator reported the error, how many types of code reported the error, customer distribution, and so on. Then combined with the PV of the page, you can see the traffic trend of hijacking.

var SriPlugin = require('webpack-subresource-integrity');
var HtmlWebpackPlugin = require('html-webpack-plugin');
var ScriptExtInlineHtmlWebpackPlugin = require('script-ext-inline-html-webpack-plugin');
var ScriptExtHtmlWebpackPlugin = require('script-ext-html-webpack-plugin');
var path = require('path');
var WebpackAssetsManifest = require('webpack-assets-manifest');
var writeJson = require('write-json');

var attackCatch = ` (function(){ function log(url, ret) { return fetch(url, { method: 'post', body: encodeURIComponent(JSON.stringify({ sizes:ret.sizes, diff:ret.diff, jscontent: ret.context, cdn: ret.cdn, edge: ret.edge, url: ret.url, protocol: ret.protocol })), headers: { "Content-type": "application/x-www-form-urlencoded" } }); } function fetchError(res){ return Promise.resolve({ text:function(){ return res.status; }, headers:res.headers || {}, status:res.status }); } function loadscript(url){ return fetch(url).then(function(res){ if(res.ok){ return res; } return fetchError(res); }).catch(function(err){ return fetchError({ status:err }); }); } function getHeader(res1,res2,key){ if(res1.headers.get){ return res1.headers.get(key); }else if(res2.headers.get){ return res2.headers.get(key); }else{ return ''; } } window.attackCatch = function(ele){ var src = ele.src; var protocol = location.protocol; function getSourceData (res1,res2,len1,len2,context1){ return Promise.resolve({ diff:(len1 === len2) ? Zero: 1, sizes:[len1,len2].join(','), cdn:getHeader(res1,res2,'X-Via-CDN'), edge:getHeader(res1,res2,'X-via-Edge'), context:context1 ? context1 : res1.status + ',' + res2.status, url:src, protocol:protocol }); } // If the fetch is not supported, it may be 404 or CDN timed out. If (window.fetch){// load 2 times, All ([loadscript(SRC),loadscript(SRC +'? Vt ='+(new Date().valueof ()))]). Then (function(values){var res1 = values[0],res2 = values[1]; // If the fetch is supported, we will return only 200 according to http.status when we fetch the second time. if(res1.status == '200' && res2.status == '200'){ var cdn = res1.headers.get('X-Via-CDN'); var edge = res1.headers.get('X-Via-Edge'); return Promise.all([res1.text(),res2.text()]).then(function(contexts){ var context1 = contexts[0]; var len1 = context1.length,len2 = contexts[1].length; return getSourceData(res1,res2,len1,len2,context1); }); }else if(res1.status == '200'){ return res1.text().then(function(context){ var len1 = context.length; return getSourceData(res1,res2,len1,-1); }); }else if(res2.status == '200'){ return res2.text().then(function(context){ var len2 = context.length; return getSourceData(res1,res2,-1,len2); }); }else{ return getSourceData(res1,res2,-1,-1); }}). Then (function(ret){if(ret && ret.context) log(' Log service interface, ',ret); }}}}) (); `;

module.exports = {
  entry: {
    index: './index.js'
  },
  output: {
    path: __dirname + '/dist'.filename: '[name].js'.crossOriginLoading: 'anonymous'
  },
  plugins: [
    new HtmlWebpackPlugin(),
    new SriPlugin({
      hashFuncNames: ['sha256'.'sha384'].enabled: true
    }),
    new WebpackAssetsManifest({
      done: function(manifest, stats) {
        var mainAssetNames = stats.toJson().assetsByChunkName;
        var json = {};
        for (var name in mainAssetNames) {
          if (mainAssetNames.hasOwnProperty(name)) {
            var integrity = stats.compilation.assets[mainAssetNames[name]].integrity;
            // Re-generate an integrity JSON file. Webpack4 supports direct generation because of versioning issues.
            json[mainAssetNames[name]] = integrity;
          }
        }
        writeJson.sync(__dirname + '/dist/integrity.json', json)
      }
    }),
    new ScriptExtHtmlWebpackPlugin({
      custom: {
        test: /.js$/,
        attribute: 'onerror="attackCatch(this)"'}}),new ScriptExtInlineHtmlWebpackPlugin({
      prepend: attackCatch
    }),
  ]
};
Copy the code

When the author reports the log, it has already diff, so 90% of the logs with diFF =1 directly searched are basically hijacked, and 10% are incomplete files (CDN sometimes returns broken files for large files, which is a known problem of CDN). See the hijacking method is very disgusting, directly change your JS into iframe, or after your JS to insert his own JS, we have also analyzed these JS, what all have, the basic user complaints before the scene to match. Finally, since we have the hijacking traffic trend and the hijacking code detail log information, we move on to the next step to solve the hijacking problem:

1, ensure that the CDN link HTTPS (the back source must also use HTTPS, because the back source is to access the site source, HTTPS will be more pressure on the site, many people do not use link HTTPS, but many times this link will be hijacked)
2. Change the file name and observe the hijacking traffic. (This is very important, because before many people would change the file name to alleviate the complaint situation, but we do not know the real effect, but we have monitoring means, change the name, traffic straight down, but… Slowly recovered the next day, can only cure the symptoms, not the root cause)
3. Do you know the jSONP principle? As a quick refresher, jSONP is used to fetch data across domains. Usually jSONP requests a server interface, and then the server returns a code containing an executable callbackName. Can we request a JS file without.js? Because jSONp does not include, we can also directly script, just need to add a type on the script, parsing is still JS code.

Through data monitoring, we found that this problem almost solved the hijacking situation 100%, so guess the carrier hijacking is recognized network download file name suffix, no difference hijacking, even if you use HTTPS; The processing in Webpack is mainly the code in onError, and the only thing that needs to be replaced is the interface for reporting logs, which needs to support POST, because the hijacked JS file content and normal file content need to be reported, so the amount of reporting logs is relatively large.