Cendertron, sliding Captcha bypass strategy for dynamic crawlers

In the Cendertron security dynamic crawler series, we introduced the design of security crawler and the construction of crawler cluster in turn. This paper discusses the bypass strategy of sliding captcha.

The strategy and code used in this article come from How to bypass “slider CAPTCHA” with JS and Puppeteer.

Bypass of sliding validation in crawler

Verification is one of the common anti-crawler strategies. Sliding verification is introduced in many sites to verify the authenticity of visitors. For example, the famous jQuery slide plugin:

Puppeteer-based dynamic crawlers make it easier to bypass such sliding validation when simulating login. Often we need to do the following steps: move to the middle of the slider, press the mouse, move the mouse, release the mouse.

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch({
    headless: false.defaultViewport: { width: 1366.height: 768}});const page = await browser.newPage();

  await page.goto('http://kthornbloom.com/slidetosubmit/');
  await page.type('input[name="name"]'.'Puppeteer Bot');
  await page.type('input[name="email"]'.'[email protected]');

  let sliderElement = await page.$('.slide-submit');
  let slider = await sliderElement.boundingBox();

  let sliderHandle = await page.$('.slide-submit-thumb');
  let handle = await sliderHandle.boundingBox();

  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();
  await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
    steps: 10
  });
  await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();
Copy the code

In a practical case, we can take taobao’s registration interface as an example:

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch({
    headless: false.defaultViewport: { width: 1366.height: 768}});const page = await browser.newPage();

  await page.evaluateOnNewDocument((a)= > {
    Object.defineProperty(navigator, 'webdriver', {
      get: (a)= > false
    });
  });

  await page.goto('https://world.taobao.com/markets/all/sea/register');

  let frame = page.frames()[1];
  await frame.waitForSelector('.nc_iconfont.btn_slide');

  const sliderElement = await frame.$('.slidetounlock');
  const slider = await sliderElement.boundingBox();

  const sliderHandle = await frame.$('.nc_iconfont.btn_slide');
  const handle = await sliderHandle.boundingBox();
  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();
  await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
    steps: 50
  });
  await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();
Copy the code

Another common type of slider is the following jigsaw slider:

const puppeteer = require('puppeteer');
const Rembrandt = require('rembrandt');

async function run() {
  const browser = await puppeteer.launch({
    headless: false.defaultViewport: { width: 1366.height: 768}});const page = await browser.newPage();

  let originalImage = ' ';

  await page.setRequestInterception(true);
  page.on('request', request => request.continue());
  page.on('response'.async response => {
    if (response.request().resourceType() === 'image')
      originalImage = await response.buffer().catch((a)= > {});
  });

  await page.goto('https://monoplasty.github.io/vue-monoplasty-slide-verify/');

  const sliderElement = await page.$('.slide-verify-slider');
  const slider = await sliderElement.boundingBox();

  const sliderHandle = await page.$('.slide-verify-slider-mask-item');
  const handle = await sliderHandle.boundingBox();

  let currentPosition = 0;
  let bestSlider = {
    position: 0.difference: 100
  };

  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();

  while (currentPosition < slider.width - handle.width / 2) {
    await page.mouse.move(
      handle.x + currentPosition,
      handle.y + handle.height / 2 + Math.random() * 10 - 5
    );

    let sliderContainer = await page.$('.slide-verify');
    let sliderImage = await sliderContainer.screenshot();

    const rembrandt = new Rembrandt({
      imageA: originalImage,
      imageB: sliderImage,
      thresholdType: Rembrandt.THRESHOLD_PERCENT
    });

    let result = await rembrandt.compare();
    let difference = result.percentageDifference * 100;

    if (difference < bestSlider.difference) {
      bestSlider.difference = difference;
      bestSlider.position = currentPosition;
    }

    currentPosition += 5;
  }

  await page.mouse.move(
    handle.x + bestSlider.position,
    handle.y + handle.height / 2,
    { steps: 10});await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();
Copy the code

Here, we adopt a simple picture comparison method, that is, in the sliding process, if there is a difference that meets the threshold, it is considered that the sliding has been successful.

Spiders configuration

Cendertron provides a special class of Slider Captcha Monkey. Add the following parameters to the SpiderOption passed in:

export interface SpiderOption {
  allowRedirect: boolean;
  depth: number;
  // Page plug-inmonkies? : { sliderCaptcha: { sliderElementSelector:string;
      sliderHandleSelector: string;
    };
  };
}
Copy the code

read

You can read the author’s series of articles in any of the following ways, covering a variety of fields, such as technical summary, programming language and theory, Web and big Front end, server-side development and infrastructure, cloud computing and big data, data science and artificial intelligence, product design and so on:

  • Browse online in Gitbook, and each series corresponds to its own Gitbook repository.
Awesome Lists Awesome CheatSheets Awesome Interviews Awesome RoadMaps Awesome-CS-Books-Warehouse
Programming language theory Java of actual combat JavaScript of actual combat Go of actual combat Python of actual combat Rust of actual combat
Software Engineering, data structures and algorithms, design patterns, software architecture Modern Web development fundamentals and engineering practices Large front-end hybrid development with data visualization Server-side development practices and engineering architecture Distributed Infrastructure Data science, artificial intelligence and deep learning Product design and user experience