Preface In my current job, I often write some scripts to do some asynchronous operations.

It’s usually a lot of data modification, or solving part of the concurrency problem.

In order to do data processing stably, regular script is generally used.

So here’s the question.

Possible problems When we are dealing with a large amount of data, the script may take a long time to execute, or it may process a piece of data repeatedly (in the case of a write error).

To avoid repeated processing of data and server stress caused by running too many scripts, we need to limit the number of scripts that can be run.

How to do

Thinking a

Query the number of processes with a certain id. If the number exceeds a certain threshold, the system exits and no processing is performed.

Idea 2

Record the PID each time. You can store it in a file, redis, memcached, etc.

When starting a new process, check to see if there are pids under the id and if they are still running and related to the current id.

When the number exceeds a certain amount, exit directly without processing.

practice

Idea 1 Practice

The ps, grep, and wc commands of Linux are used to obtain the number of running processes with a specified identifier.

<? php/** * whether to run **@param  String $ident identifier *@param  Integer $maxNum Maximum number of runs * *@return bool* /
function canRun($ident, $maxNum)
{
   $cmd = sprintf('ps ax | grep %s | grep -v /bin/sh | grep -v grep | wc -l', $ident);
   $fp  = @popen($cmd, 'r');
   $num = (int)trim(@fread($fp, 2096));
   @pclose($fp);
   return $num <= $maxNum;
}

Copy the code

Idea 2 Practice

Here, redis is used to store PID information.

Check whether the specified process is still running by using the /proc/{pid}/cmdline file.

<? php/** * Check whether pid is alive **@param  string $pid   PID
 * @param  String $ident identifies * *@return bool* /
function isSurvive($pid, $ident)
{
   // Get the cmdline file for the specified PID
   $cmdlinePath = sprintf('/proc/%s/cmdline', $pid);
   if(! is_file($cmdlinePath)) {return false;
   }
   $cmdline = trim(file_get_contents($cmdlinePath));
   // Check if the id is in cmdline
   returnstrpos($cmdline, $ident) ! = =false;
}

/** * whether to run **@param  String $ident identifier *@param  Integer $maxNum Maximum number of runs * *@return bool* /
function canRun($ident, $maxNum)
{
    // Assuming it is already linked
    $redisHandler = getRedis();
    // Define a key
    $key = sprintf('php:job:%s:pid', $ident);
    // Current PID
    $currentPid = getmypid();
    // Write the current PID to redis
    $redis->sAdd($key, $currentPid);
    // Get all pids in redis
    $pids = $redis->sMembers($key);
    // Traverse the PID to check whether it is valid
    foreach ($pids as $index => $pid) {
        if ($currentPid == $pid) {
            continue;
        }
        // Check whether the PID is still running
        if (isSurvive($pid, $ident)) {
            continue;
        }
        // If it is no longer running, delete it directly
        unset($pids[$index]);
        $redis->sRemove($key, $pid);
   }
   return count($pids) <= $maxNum;
}


Copy the code

About the logo

In terms of identity, perhaps the unifying part is PHP when we run some timing scripts; Or, scripts that have the same identity, we’ll put them in a few categories.

To meet these requirements, you can use PHP’s built-in cli_set_process_title function to implement a custom COMMAND.

demo.php:

At this point, we can run demo.php, and we can see the following result through ps Ax:

PID   USER     TIME  COMMAND
    1 root      0:09 php-fpm: master process (/usr/local/etc/php-fpm.conf)
    7 root      0:16 php-fpm: pool www
    8 root      0:15 php-fpm: pool www
    9 root      0:14 php-fpm: pool www
   10 root      0:00 sh
  663 root      0:00 sh
  690 root      0:00 {php} Job Demo
  691 root      0:00 ps ax
Copy the code

By changing the process title that specifies the script, we can implement the identity that defines some of the scripts.

The last

Buggy features can also have bugs, and we need more thinking and design to reduce the occurrence of such errors.