Countdown to peace of mind

After busy today’s work, finally arrived Friday, can have a good rest.

Sleep wake up

At the time of peace of mind, my colleague transferred to me a NGINX 502 alarm, quickly go to the online investigation.

01 machine nginx log error message at alarm time:

*272881176 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: Xx.xx.xx.xx, server:, request: “POST/XXX/XXX HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”, host: “xx.xx.xx.xx:8081”

Recv () is a system function that receives returned data, and the error reason can be considered as

Nginx returns error 502 to the client when it finds that a service has broken its communication connection.

So nginx where receiving data, error message also obviously, fastcgi: / / 127.0.0.1:9000

Think about why

Also, why does PHP processing break?

  1. Is it possible to execute the task timeout, FPM active kill?
  2. System resources are insufficient, the system killed

For these two cases, the investigation results are as follows:

  1. This interface for alarm is not particularly complex interface, execution time is not long, and there have been no problems in the past
  2. According to Zabbix, buried point monitoring and system load, the overall process of CPU, memory and FPM is also normal

I also took a look at FPM’s error log and slow log, but I didn’t get anything either (important information is probably omitted here).

clues

So since you think the problem is FPM, investigate the FPM configuration file

pid = /usr/local/var/run/php-fpm.pid
Pid Settings, be sure to enable, the above is Mac platform. The default is var/run/ php-pm.pid in the PHP installation directory. Such as in centos: / usr/local/PHP/var/run/PHP - FPM. Pid
error_log  = /usr/local/var/log/php-fpm.log
# error log, is Mac platform above, the default PHP installation directory in the var/log/PHP - FPM. Log, such as in centos: / usr/local/PHP/var/log/PHP - FPM. Log
log_level = notice
# error level. The above php-ftp.log record is registered. The available levels are: Alert (must be handled immediately), Error, Warning, notice, debug. Default: notice.
emergency_restart_threshold = 60
emergency_restart_interval = 60s
# Indicates that phP-FPM gracefully restarts phP-FPM if the number of PHP-CGI processes with SIGSEGV or SIGBUS errors within the value set by emergency_REstart_interval exceeds emergency_restart_threshold. The default values for these two options are generally kept. 0 indicates' disable this feature '. Default value: 0 (disable).
process_control_timeout = 0
# set the timeout for the child process to accept the main process multiplexing signal. Available units: s(seconds), m(minutes), h(hours), or D (days) default unit: s(seconds). Default value: 0.
daemonize = yes
The default value is yes, and can be changed to no for debugging purposes. In FPM, you can run multiple process pools using different Settings. These Settings can be set individually for each process pool.Listen = 127.0.0.1:9000# FPM listener port, which is the address handled by PHP in nginx. Available format for: 'IP: port', 'port', '/ path/to/Unix/socket. Each process pool needs to be set. If Nginx and PHP are distributed on different machines, you can set the IP here.
listen.backlog = -1
#backlog specifies the length of the listen half-connection queue. -1 indicates unrestricted, as determined by the operating system. Backlog Meaning reference: http://www.3gyou.cc/?p=41Listen. Allowed_clients = 127.0.0.1Set any to unrestricted IP addresses. If you want to set nginx on other hosts to be able to access this FPM process, listen to set the cost of accessible IP addresses. The default value is any. Each address is separated by commas. If not set or empty, any server is allowed to request a connection.
listen.owner = www
listen.group = www
listen.mode = 0666
The Unix socket is set to TCP.
user = www
group = www
The user and user group that started the process, the Unix user for which the FPM process is running, must be set. User group. If this is not set, the group of the default user is used.
pm = dynamic 
#php-fpm startup mode, PM can be set to static and dynamic and ondemand
Pm. max_children = pm.max_children = pm.max_children = pm.max_children
# if dynamic is selected, the number of processes changes dynamically, depending on the following parameters:
pm.max_children = 50 # Maximum number of child processes
pm.start_servers = 2 The default value is min_spare_SERVERS + (max_spare_servers-min_spare_servers) / 2
pm.min_spare_servers = 1 If the number of idle processes is smaller than this value, create a new child process
pm.max_spare_servers = 3 #, ensure the maximum number of idle processes. If the number of idle processes is greater than this value, it will be cleaned up
pm.max_requests = 10000
Set the number of requests to be served by each child process before it spawns. This is useful for third-party modules that may have memory leaks. If set to '0', requests are always accepted. Equivalent to the PHP_FCGI_MAX_REQUESTS environment variable. Default value: 0.
pm.status_path = /status
#FPM status page url. If it is not set, the status page cannot be accessed. Default value: none. munin will be used by monitoring
ping.path = /ping
#FPM monitor page ping url. If the ping page is not set, you cannot access the ping page. This page is used externally to check whether the FPM is alive and ready to respond to requests. Note that it must begin with a slash (/).
ping.response = pong
Used to define the response to a ping request. Return HTTP 200 text/plain text. Default value: pong.
access.log = log/$pool.access.log
The access log for each request is disabled by default.
access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %C%%"
Format the access log.
slowlog = log/$pool.log.slow
Slow slowlog_timeout (request_slowlog_timeout
request_slowlog_timeout = 10s
When a request is made for this timeout, the corresponding PHP call stack information is written to the slow log in its entirety. Set it to '0' for 'Off'
request_terminate_timeout = 0
Set timeout timeout for a single request. This option may be useful for 'max_execution_time' in the php.ini setting if the script is not aborted for some special reason. Setting it to '0' means 'Off'. Try changing this option when 502 errors occur frequently.
rlimit_files = 1024
# set the rlimit limit for the file open descriptor. Default value: system defined Value The default value is 1024. You can run the ulimit -n command to view and ulimit -n 2048 command to change the value.
rlimit_core = 0
# Set core rlimit maximum value. Available values: 'unlimited', 0, or a positive integer. Default value: system-defined value.
chroot =
The Chroot directory at startup. The directory defined needs to be an absolute path. If it is not set, chroot is not used.
chdir =
# set the boot directory, automatically Chdir to this directory during startup. The directory defined needs to be an absolute path. Default: current directory, or/directory (chroot)
catch_workers_output = yes
Redirect runtime stdout and stderr to the main error log file. If not, stdout and stderr will be redirected to /dev/null according to FastCGI rules. Default value: null.
Copy the code

From www.zybuluo.com/phper/note/…

Separate out a few important configuration items:

pm = static
#php-fpm startup mode, PM can be set to static and dynamic and ondemand
Pm. max_children = pm.max_children = pm.max_children = pm.max_children

pm.max_children = 500 # Maximum number of child processes

request_terminate_timeout=30
Set timeout timeout for a single request. This option may be useful for 'max_execution_time' in the php.ini setting if the script is not aborted for some special reason. Setting it to '0' means 'Off'. Try changing this option when 502 errors occur frequently.

request_slowlog_timeout=3
When a request is made for this timeout, the corresponding PHP call stack information is written to the slow log in its entirety. Set it to '0' for 'Off'

pm.max_requests=10000
Set the number of requests to be served by each child process before it spawns. This is useful for third-party modules that may have memory leaks. If set to '0', requests are always accepted. Equivalent to the PHP_FCGI_MAX_REQUESTS environment variable. Default value: 0.
Copy the code

Begin to learn

This is the main configuration for our line, again focusing on request_terminate_timeout. How is it different from max_execution_time in php.ini

The set_time_limit() function and the max_execution_time configuration directive only affect the execution time of the script itself. The maximum time for any script to execute such as system calls using system(), stream operations, database operations, etc. is not included, and request_terminate_timeout is included for all times

The php.ini configuration time is also 30, but the request_terminate_timeout is shorter.

Return to the chase

But as we mentioned, this interface is not very complicated. . It should not have timed out. There was no exception in the third party service at that time, and the FPM error log does not have this timeout error. Each dependent system load is still in a relatively low peak state

After finishing today’s work in doubt, I came back to write and share today’s debugging experience. When SEARCHING the document, I found this sentence again:

Copy other people’s articles

Nginx 502 Bad Gateway error

In php.ini and php-fpm.conf, there are two configuration items: max_execution_time and request_terminate_timeout.

Both of these items are used to configure the maximum execution time of a PHP script. When this time is exceeded, phP-fpm not only terminates script execution,

It also terminates the Worker process executing the script. So Nginx will find that the connection with itself is broken and will return an error 502 to the client.

In the php-fpm request_terminate_timeout=30 seconds, the following error message is displayed:

1) Nginx error access log

2013/09/19 01:09:00 [error] 27600#0: *78887 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.101, server: Test.com, the request: “POST/index. PHP HTTP / 1.1”, upstream: “fastcgi: / / Unix: / dev/SHM/PHP – fcgi. The sock:”, host: “test.com”, referrer: “test.com/index.php”

2) PhP-FPM error:

WARNING:  child 25708 exited on signal 15 (SIGTERM) after 21008.883410 seconds from start
Copy the code

So simply increasing the values of these two items will prevent the PHP script from being terminated because it takes too long to execute.

Request_terminate_timeout can override max_execution_time, so if you don’t want to change the global php.ini, just change the php-fpm configuration.

Also note the max_fail and fail_timeout entries in the upstream module. Sometimes the communication between Nginx and the upstream server (e.g., Tomcat, FastCGI) is just accidentally broken, but if max_fail is set to a small value, Nginx will assume that the upstream server is down for the following fail_timeout period, and will return error 502. Therefore, max_FAIL can be larger and fail_timeout can be smaller.

From the best 51 cto: blog.51cto.com/nanchunle/1…

justify

I am not familiar with the upstream module of Nginx. I did find that the same error was reported on several different interfaces in the log. It is possible that other interfaces may have affected this interface, but it happened to be captured by the alarm system.

reflection

Thinking about a problem may be too one-sided, without a set of system and ideas to solve the problem, easy to detour, or even go in the opposite direction.

conclusion

This is an inconclusive article.

We will try to reproduce this scene in the future, hope you continue to pay attention.

Welcome to “Dumb Bear Tips”