The background,

We are planning to optimize the back-end interface recently, so we need to do statistics on the existing interface first. Because the relevant business interface is index.php? Action =XXX, so the company operation and maintenance management platform can not meet the requirements, so you have to deal with the nginx log interface situation.

Second, processing ideas

1. Nginx logs contain more than related interface logs. Therefore, you need to filter out required logs based on key fields.

2. Analyze the log format in nginx.conf to extract related fields

3. Use regular expressions to extract log data

4. Process data and form statistical data

(The statistical data mainly includes interface action, total number of calls, times above 2s, ratio of slow number, maximum time, and average time)

Script code

Because this script is basically a one-off, you may not be able to consider all the details
<? PHP // here is the corresponding configuration in nginx.conf$configData = '[$time_local] [$request] [$status] [$http_referer] [$request_time] [$upstream_response_time]';
$regex = '/\[([^\]]*)\]/ism';
preg_match_all($regex.$configData.$config);
$config = array_flip($config[1]). // The required field$keyList = array('$request_time'.'$upstream_response_time');
$fp = fopen('nginx.log'.'r');
$result= array(); // Load the log line by line (file_get_contents) and explode.while(! feof($fp)) {
    $logValue = fgets($fp); Preg_match_all ($regex, trim($logValue), $log);
    $logList = $log[1]. Foreach ();$keyList as $key) {
    $tmp[$key] = $logList[$config[$key]]. } // Extract the action from the request header$actionRegex = '/ (? <=action=)[^&]*/';
    preg_match($actionRegex.$tmp['$request'].$action);
    $action = $action[0]. // Process dataif (isset($result[$action]) {$result[$action] ['count'] + +;$result[$action] ['allRequest'] + =$tmp['$request_time'];
    if ($tmp['$request_time'] > $result[$action] ['maxRequest']) {
    $result[$action] ['maxRequest'] = $tmp['$request_time'];
    }
    if ($tmp['$request_time'] > 2) {
    $result[$action] ['slowCount'] + +; }$result[$action] ['averageRequest'] = $result[$action] ['allRequest'] / $result[$action] ['count'];
} else {
    $result[$action] = array(
    'count'= >'1'.'slowCount'= >'0'.'allRequest'= >$tmp['$request_time'].'maxRequest'= >$tmp['$request_time'].'averageRequest'= >$tmp['$request_time']); } } file_put_contents('result.log', json_encode($result));
Copy the code

Detailed explanation of regular expression

$regex = '/\[([^\]]*)\]/ism'; \[\] means with"["Start and to"]"Closing () is a marker subexpression, and can be extracted using [^\]] is a bracketed expression that can match any character except"]"* repeats the previous parenthesis expression zero or more times$actionRegex = '/ (? <=action=)[^&]*/'; (? <=action=) is the reverse affirmative precheck from"action="The search string [^&] is a bracketed expression that can match any character except"&"

Copy the code

V. Statistical results

After a few days of analyzing the data from the Nginx logs and intercepting some of the data, you can optimize it with your code!

Six, result

Regular expressions are still unstable every time you use them, so try them out!