
Captchas, with a bit of simple logic, can avoid the harassment of an army of scripts. But swords are often double-edged, and not for every situation. This article shows the costs of using graphic captcha by documenting an online incident, and discusses how to deal with similar situations.

Popularize a cold knowledge: CAPTCHA is the Abbreviation of Completely Automated Public Turing Test to tell Computers and Humans Apart. The translation is “Fully automatic Turing Test to tell a computer from a human.”

The accident process

It all starts with a simple need

The company’s new game is coming online, the operation has been excited to rub hands, so an appointment activity planning down. You know, you type in your phone number and your phone system. The requirement was so simple that the developers didn’t think twice about it. The only layer of protection is to avoid script brush interface, requiring a graphic verification code to be entered when booking.

Tragedy waiting to happen

On 3,2,1, the event begins. Things started well, with traffic flowing in and reservation data being stored. After a while, customer service started to get busy, with a lot of feedback from players: where the graphic verification code should be, it now shows an X. As a result, a sentence “does your company use tudou server?” swept the weibo posts.

Start thinking about
This is the general practice of graphic captcha
  • The interface generates an image in memory by inserting the build captcha address in the SRC of the IMG tag. More seriously, the picture will be on the interference line and noise.
  • After the captcha is stored in Session, the image is returned.
  • The user submits data and compares parameters with Session values.
Nothing is taken for granted

This would have been the most normal operation, but a closer look reveals that the cost of generating a graphic captchas is very high.

  • Generate images that take up more memory than normal operations
  • Generate random number, noise, interference line, need to generate random number
  • Image transmission occupies bandwidth
  • The verification code reads and writes Session, which needs to read and write disks
Practice a

Write a simple captcha generation method:

$str = "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHIJKLMNPQRSTUVW";
$len = strlen($str) - 1;
$code = ' ';
for ($i = 0; $i < 4; $i{+ +)$code. =$str[mt_rand(0.$len)];

$img = imagecreatetruecolor(100.30);
imagefilledrectangle($img., imagecolorallocate($img.255.255.255));
imagefttext($img.20, mt_rand(-5.5), 10.25, imagecolorallocate($img.0.0.0), '{your font path}'.$code);

$_SESSION['captcha'] = $code;

header("Cache-Control: no-cache");
header("Content-type: image/png; charset=utf-8");
Copy the code
  • No framework, components, simplify the loading process. The more powerful the function, the more limited the performance.
  • In order to reduce the generation of random number, the verification code is only four, the background color, font color directly fixed, no noise, interference line.
  • After the image is generated and output, the resource is destroyed.
  • Do not save pictures to disk.

Write a method that simulates the user’s input of a captcha:

$input = $_POST['verify']????' ';
if ($input= = =' ') {
    return [
        'code'= >0.'msg'= >'verify empty',]; }$code = $_SESSION['captcha']????' ';
if ($code= = =' ') {
    return [
        'code'= >0.'msg'= >'verify not exist',]; }if (0! == strcasecmp($input.$code)) {
    return [
        'code'= >0.'msg'= >'verify failed',]; }unset($_SERVER['captcha']);
return [
    'code'= >1.'msg'= >' ',];Copy the code
A single call

It took 184ms to get the captcha. Let’s take a look at the machine metrics. Well, not much. Average load per minute is 0.3.

What is system 1 minute average load?

The percentage of CPU power consumed in 1 minute.

For example, if the machine is 2-core, then at full load, the maximum is 2. The load is 0.3, which means the program uses only about a third of the computing power of one CPU.

Try it 1,000 times this time
go-stress-testing-linux -c 1000 -n 1 -u {your url}
Copy the code

The total time was 5s, and all were successful (HTTP status code 200). Then look at the indicator, the system load per minute increased to 1, not bad.

Kangkang, what is your limit?
go-stress-testing-linux -c 10000 -n 1 -u {your url}
Copy the code

Total time 36s, success 5007, failure 4993 (HTTP status code 509), see indicators:

Hardware resources are not exhausted. How can there be failed requests?

The reason is very simple, as you’ve been told, bandwidth is full. The captcha image generated by the code is about 1.7K, which is much larger than other types of data. The strength of one person may not be much, but the strength of tens of thousands of individuals cannot be ignored.

The HTTP status code said see (…

We have to find a way to get all the requests in

To achieve this, it takes a bit of a makeover:

  • Move the logic for generating captcha from Controller to Laravel’s Command module so that PHP scripts don’t time out.
  • Write an SH script to start the CMD, thus bypassing the Web server.
#! /bin/bash
start_at=`date +%s`
for((i=1; i<=5000; i++));do
php artisan {your command} > /dev/null;
end_at=`date +%s`
echo $[end_at-$start_at]
Copy the code

The total time is 1219s. Take a look at the indicators:

Processing 5K captcha generation already took more than 20 minutes, from this point alone can not take the official server. Since there is no Nginx processing, memory is not skyrocketing.

Analysis of a wave

According to the above data, the resources consumed by self-generated graphic verification codes are in the descending order of network bandwidth > CPU > memory > disk read and write.

My virtual machine has 2 cores, 4G bandwidth, and 3M bandwidth, and can only handle about 140 captch-code requests per second with a Nginx timeout of 5 minutes (which is a long time).

How do you optimize it

Clams, one might say? What is there to optimize? As said above, nothing is really taken for granted. The author here throws out a brick to attract jade, presents an ugly.

Limit the refresh frequency of the front end

This is the case with me. I hit refresh when I get upset because I can’t see the captcha. Verification code generation itself takes a certain amount of time, the result spawning new before it has time to display, and then entered the next generation cycle.

Use js to slightly disable the img image load until it is finished. You can avoid this by waiting a few seconds after refreshing.

Separate the captcha generated services from the primary logical services

This is the same reason that static resources are separated from the rest of the site. Since captcha images take up bandwidth, traffic from the primary logical server is not taken, thus increasing the throughput of the primary logical server.

Use captcha without pictures

Drag a block to the far right, complete a puzzle (which also requires an image, but only loads it once), rotate the image to the front, etc.

Use a third-party verification code

What can be solved with money is not a problem.

I’ll stop there

I hope this article has been of some help to you. Thank you for reading. See you later.