Article writing Plan

To be done: Details the tools used

Author: Congratulations

Suitable for readers

The tuning ideas in this article are used in PHP, Java, or any other language. If you have PHP experience, that’s even better

Business background

Framework and corresponding environment

  1. Laravel5.7 mysql5.7, redis5 nginx1.15
  2. Centos 7.5 BBR
  3. docker, docker-compose
  4. Ali Cloud 4C and 8G

The problem background

PHP has opcache enabled, Laravel has run the optimize command for optimization, and Composer has run the dump-Autoload command.

The first thing to note is that there are certainly small problems in the system environment (no problems would be able to improve such a large amount of performance), but these problems, if not through the use of appropriate tools, may never be discovered.

This article focuses on how to find these problems and how to find them.

We start by finding a suitable API or function in the system that magnifies the problem.

This API was originally designed as a health check for nGINx load balancing. Using ab-N 100,000-C 1000 for pressure measurement, it was found that QPS can only reach 140 per second.

We know that Laravel’s performance is notoriously bad, but not to the extent that it should be, given how the API is written. So I decided to find out.

 public function getActivateStatus(a)
    {
        try {
            $result = \DB::select('select 1');
            $key = 1;
            if ($result[0]->$key ! = =1) {
                throw new \Exception(Mysql check failed); }}catch (\Exception $exception) {
            \Log::critical({$exception->getMessage()}", $exception->getTrace());
            return \response(null.500);
        }
        try {
            Cache::getRedis()->connection()->exists("1");
        } catch (\Exception $exception) {
            \Log::critical({$exception->getMessage()}", $exception->getTrace());
            return \response(null.500);
        }
        return \response(null.204);
    }
Copy the code

Problem manifestation and troubleshooting ideas

top

The top command found that the CPU usage of the system was 100%, including 80% in user mode and 20% in kernel mode, which seemed to be no big problem. One thing that looks strange is the result of running the top command

%CPU -- CPU usage

The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.
Copy the code

This is the amount of CPU used by the process at the time of the last screen refresh. Some phP-fpm processes are in the sleep state at the same time (when the screen is refreshed), so it should not be a problem with phP-fpm.

pidstat

First, select a PHP-fPM process, and then use pidstat to see the process in detail

vmstat

Keep the pressure on, run vmState, and you don’t see much exception except that the context switch is a little high. Since docker, Redis and mysql are all running on the same machine, a CS of around 7000 is still a reasonable range, but the IN(interruption) is a bit too high, reaching around 14,000. Something must have triggered the interruption.

Both vmstat and Pidstat are new detection tools, and we can’t see who the interrupt is coming from. We read interrupts from the system in the /proc/interrupts read-only file to see what is causing the interrupts to rise. Use the watch -d command to identify the interrupts that change most frequently.

watch -d cat /proc/interrupts
Copy the code

Rescheduling Interrupts changes the most quickly. Rescheduling Interrupts (RES) is an interrupt type that wakes up idle cpus to schedule new tasks. This is the mechanism used by the scheduler in a multiprocessor system (SMP) to spread tasks among different cpus, commonly known as inter-processor Interrupts (IPI). Combined with the vmstat command, we can confirm that one of the reasons for the low QPS is that too many processes compete for CPU. We are not sure about the specific cause, so we need to further investigate.

strace

Strace can check the system call, we know that when using the system call, the system will fall into the kernel state, this process is a soft interrupt, by checking the phP-FPM system call, verify our guess

A large number of STAT system calls were found

    opcache.validate_timestamps="60"
    opcache.revalidate_freq="0"
Copy the code

Run the ab command again to test the pressure

46%

perf

Now still do not meet this performance, hope to find a breakthrough in more places. through

perf record -g
perf report -g
Copy the code

See the system analysis report

  1. A large number of repeated TCP connections with mysql and Redis consume resources
  2. TCP connections with a large number of requests

Phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis: phpredis

Open the config/database. PHP file of Laravel, change the driver of Redis to phpredis, and ensure that PHP’s Redis extension is installed on the machine. In addition, since Laravel itself wraps a Redis facade, the name of the object brought by the Redis extension is also Redis. So you need to modify Laravel’s Redis facade to a different name, such as RedisL5.

Pressure test again

conclusion

Through top, we found that the CPU usage of the system was high, and it was indeed the PHP-FPM process that occupied CPU resources, so we judged that the system bottleneck was from PHP.

Using pidstat, vmstat, and watch -d cat /proc/interrupts to find that the main interrupts are rescheduling interrupts (RES)

A large number of system calls came from STAT. Opcache frequently checks the timestamp to determine file changes. A 46% performance improvement was achieved by modifying configuration items

Finally, through PERF, check the function call stack, and analyze that a large number of TCP connections with Redis may bring unnecessary resource consumption. Another performance improvement of nearly 50% was achieved by installing redis extensions and using PHpredis to drive Laravel’s Redis cache.

We achieved our goal of a 104% performance improvement