How to use Chrome plugin to block ads

Project address: Chrome_plugin_ZHIHU_adblock

This article may require some basic knowledge of the Chrome plugin. What can you learn from this article?

  1. Chrome plugin blocks adsThinking methods and general principles
  2. The browserIntercept fetch and XHR requestsThe method of

Train of thought

Advertisements on web pages can be divided into the following three situations:

  1. Ads on web pages are made up of HTML, so as long asUse the Chrome plugin to delete the HTMLCan.
  2. Some ads are loaded dynamically after normal loading, and they get mixed up with regular code blocks, where we can intercept HTTP requests for code blocks, and thenAfter the request is successful, remove the AD from the HTML.
  3. Some of the ads are loaded through specific HTTP requests, and here we block those HTTP requests and let themSend not go outCan.

Delete the HTML

We just need to know the selector of the AD HTML and delete it. Here we’ll write a generic function:

// fuckAd
const createFuckAd = (adSelector, textSelector) = > (a)= > {
    / / advertising
    const ads = document.querySelectorAll(adSelector);

 if (ads.length > 0) {  const cardBrand = document.querySelector(textSelector);   if (cardBrand) {  console.log('Advertising has been blocked:${cardBrand.innerText}`);  }   // Delete the AD  [...ads].forEach(item= > item.parentNode.removeChild(item));  } } Copy the code

And then we just delete the AD inside the onLoad event. Taking Zhihu as an example, the code is as follows:

window.onload = (a)= > {
    createFuckAd('.Pc-card') ();    createFuckAd('.Pc-feedAd-container'.'.Pc-feedAd-card-brand--bold') ();    createFuckAd('.Pc-word-card'.'.Pc-word-card-brand-wrapper > span') ();}
Copy the code

Intercept request

For dynamically loaded ads, we can block the request to determine if there is a request to load an AD coming in, and then for code mixed with normal requests, we remove the AD after the request succeeds. For AD only requests, we block those requests directly. Here we use fetch as an example (for XHR requests, we can use this library ajax-hook), with the following code:

// hook fetch
const fetch_helper = {
    originalFetch: window.fetch.bind(window),
    myFetch: function (. args) {
        // Block ad-only HTTP requests
 if (args[0].includes('https://www.zhihu.com/commercial_api/')) {  return Promise.reject(1);  }   returnfetch_helper.originalFetch(... args).then((response) = > {  // For HTTP requests with normal code, after the request completes, the AD is removed  if (response.url.startsWith('https://www.zhihu.com/api/v3/feed/topstory/recommend?')) {  setTimeout(createFuckAd('.Pc-feedAd-container'.'.Pc-feedAd-card-brand--bold'), 188);  }  return response;  });  }, }  window.fetch = fetch_helper.myFetch; Copy the code

other

We’ve already written the core code, and we just need to introduce it when document_start. It’s important to note that we can’t run this code directly inside contentScript, because while contentScript can manipulate DOM elements on a page, it runs in a separate environment. Here we need to use Chrome.runtime. getURL to get the URL and load it dynamically:

const s = document.createElement("script");
s.src = chrome.runtime.getURL("main.js");
s.onload = function () {
    s.parentNode.removeChild(s);
};
(document.head || document.documentElement).appendChild(s); Copy the code

thinking

In fact, the above is only the method of blocking ads, but for a good AD blocking plug-in, how to determine HTML and JS is an AD, is really a very big challenge, we can look at the source code of Adblock Plus can find it as follows:

// popupBlocker.js
function checkPotentialPopup(tabId, popup)
{
  let url = popup.url || "about:blank";
  let documentHost = extractHostFromFrame(popup.sourceFrame);
  letspecificOnly = !! checkWhitelisted( popup.sourcePage, popup.sourceFrame, null. contentTypes.GENERICBLOCK  );   let filter = defaultMatcher.matchesAny(  parseURL(url), contentTypes.POPUP,  documentHost, null, specificOnly  );   if (filter instanceof BlockingFilter)  browser.tabs.remove(tabId);   logRequest(  [popup.sourcePage.id],  {url, type: "POPUP".docDomain: documentHost, specificOnly},  filter  ); } Copy the code

It first gets the possible pop-up URL, then determines whether the HOST of the URL is the same as the host of the current page, and then determines whether there is a whitelist. And, of course, using TensorFlow to gather possible AD urls through machine learning:

// ml.js
const tfCore = require("@tensorflow/tfjs-core");
const tfConverter = require("@tensorflow/tfjs-converter");

for (let object of [tfCore, tfConverter])
{  for (let property in object)  {  if (!Object.prototype.hasOwnProperty.call(tf, property))  tf[property] = object[property];  } } Copy the code