Use Nginx to build front-end log statistics service (dot collection) service
Have you ever wondered where the data comes from when you need to “support” your decisions at work? If the business involves Web services, one of the sources of this data is the request data from the various servers on the server. If we separate the data dedicated to statistics among the servers, and some of the servers focus on receiving “statistics type” requests, these logs are “dotting logs.”
This article will show you how to use Nginx in a container to simply build a statistics (dot collection) service that supports front-end use, without introducing too many technology stacks and increasing maintenance costs.
Writing in the front
I wonder if you have ever thought about a problem. When there are many dogging events in a page, countless requests will be launched at the same time when the page is opened. At this moment, the user experience will not exist in the non-broadband environment, and the dogging server will also face the service DDoS behavior from friendly forces.
Therefore, in recent years, some companies have changed the data statistics scheme from GET to POST, combined with self-developed and customized SDK, carried out “package and merge” on the data statistics of the client, and reported incremental logs with a certain frequency, which greatly solved the front-end performance problems and reduced the pressure on the server.
Five years ago, I shared how to build easily extensible front-end statistics scripts for your interest.
Problems with POST requests in Nginx environments
You might be confused by the title of this section. POST interactions with Nginx are common. What’s the problem?
Let’s do a little experiment and start an Nginx service using a container:
Docker run --rm it -p 300:80 nginx:1.19.3-alpineCopy the code
Using curl to simulate a POST request in a daily business:
curl -d '{"key1":"value1", "key2":"value2"}' -X POST http://localhost:3000
Copy the code
You should see the following return result:
<html> <head><title>405 Not Allowed</title></head> <body> <center><h1>405 Not Allowed</h1></center> < hr > < center > nginx / 1.19.3 < / center > < / body > < / HTML >Copy the code
For the Nginx modules/ngx_http_stub_status_module.c and HTTP /ngx_http_special_response.c, see the following implementation:
static ngx_int_t ngx_http_stub_status_handler(ngx_http_request_t *r) { size_t size; ngx_int_t rc; ngx_buf_t *b; ngx_chain_t out; ngx_atomic_int_t ap, hn, ac, rq, rd, wr, wa; if (! (r->method & (NGX_HTTP_GET|NGX_HTTP_HEAD))) { return NGX_HTTP_NOT_ALLOWED; }... }... static char ngx_http_error_405_page[] = "<html>" CRLF "<head><title>405 Not Allowed</title></head>" CRLF "<body>" CRLF "<center><h1>405 Not Allowed</h1></center>" CRLF ; #define NGX_HTTP_OFF_4XX (NGX_HTTP_LAST_3XX - 301 + NGX_HTTP_OFF_3XX) ... ngx_string(ngx_http_error_405_page), ngx_string(ngx_http_error_406_page), ...Copy the code
Yes, by default, NGINX does not support logging POST requests and displays error code 405 according to RFC7231. Therefore, in general, we will use Lua /Java/PHP/Go/Node and other dynamic languages for auxiliary parsing.
So how to solve this problem? Is it possible to simply use a high-performance, lightweight Nginx to support POST requests without external help?
Let Nginx “natively” support POST requests
In order to display the configuration more clearly, we will use compose to start the Nginx experiment. Before writing the script, we will need to get the configuration file and use the following command line to save the configuration file for the specified version of Nginx to the current directory.
Docker run - rm - it nginx: 1.19.3 - alpine cat/etc/nginx/conf. D/default. Conf > default. ConfCopy the code
The default configuration file is as follows:
server { listen 80; server_name localhost; #charset koi8-r; #access_log /var/log/nginx/host.access.log main; location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php${# proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php${# root HTML; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #}}Copy the code
A little simplification would give us a simpler configuration file and add a line error_page 405 =200 $uri; :
server {
listen 80;
server_name localhost;
charset utf-8;
location / {
return 200 "soulteary";
}
error_page 405 =200 $uri;
}
Copy the code
Rewrite the commands at the beginning of this section to docker-comemage. yml and add volumes to map the configuration file to the container for verification using compose.
Version: "3" Services: NGX: Image: nginx:1.19.3- Alpine Restart: Always ports: -3000 :80 Volumes: - ./default.conf/:/etc/nginx/conf.d/default.confCopy the code
Start the service with docker-compose up, and then use the previous curl to simulate a POST to verify that the request is normal.
curl -d '{"key1":"value1", "key2":"value2"}' -H "Content-Type: application/json" -H "origin:gray.baai.ac.cn" -X POST http://localhost:3000
soulteary
Copy the code
In addition to the string “soulteary” being returned, Nginx logs will also have a normal-looking record:
Ngx_1 | 192.168.16.1 - [31 Oct / 2020:14:24:48 + 0000] "POST/HTTP / 1.1", 200 "-" "curl / 7.64.1" "-"Copy the code
However, if you are careful, you will find that the log does not contain the data we send, so how to solve this problem?
Resolve missing POST data in Nginx logs
This issue is an old one. The default Nginx server logging format does not include a POST Body (for performance reasons) and does not parse a POST Body without proxy_pass.
Execute the following command first:
Docker run --rm it nginx:1.19.3-alpine cat /etc/nginx/nginx.confCopy the code
You can see that the default log_format configuration rule really doesn’t have any data in the POST Body.
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
Copy the code
Add a new log format, add the POST Body variable (request_body), and then add a proxy_pass path to enable Nginx to parse the POST Body processing logic.
For maintenance reasons, we merge the previous configuration file with this configuration and define a path named /internal-api-path:
user nginx; worker_processes auto; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for" $request_body'; access_log /var/log/nginx/access.log main; sendfile on; keepalive_timeout 65; server { listen 80; server_name localhost; charset utf-8; The location / {proxy_pass http://127.0.0.1/internal-api-path; } location /internal-api-path { # access_log off; default_type application/json; return 200 '{"code": 0, data:"soulteary"}'; } error_page 405 =200 $uri; }}Copy the code
After saving the new configuration file as nginx.conf, adjust the configuration information of the volumes in compose and use docker-compose up again to start the service.
volumes:
- ./nginx.conf/:/etc/nginx/nginx.conf
Copy the code
Using curl to simulate the previous POST request, we can see that the Nginx log has two more records. The first record contains the POST data we need:
192.168.192.1 - [31 Oct / 2020:15:05:48 + 0000] 200 "POST/HTTP / 1.1" and "-" "curl / 7.64.1" "-" {\x22key1\x22:\x22value1\x22, \ x22KEY2 \x22:\x22value2\x22} 127.0.0.1 - - [31/Oct/2020:15:05:48 +0000] "POST /internal-api-path HTTP/1.0" 200 29 "-" "Curl / 7.64.1" - "-"Copy the code
But there are many more imperfections:
- The server can receive GET requests normally, we need to do a lot of “discard actions” in log processing, and there is unnecessary waste of disk space in temporary storage.
- The same problem occurs when the path used to activate Nginx’s POST Body parsing capability can be arbitrarily called, generating meaningless logs.
- What’s more, the data in the log appears to require additional processing, transcoding, and parsing efficiency at an unnecessary performance cost.
Let’s continue to solve these problems.
Improved Nginx configuration to optimize logging
First, add the escape=json parameter to the log format, requiring Nginx to parse the JSON data in the log request:
log_format main escape=json '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" $request_body';
Copy the code
Then, add access_log off to the path where logging is not required; Instruction to avoid unnecessary logging.
location /internal-api-path {
access_log off;
default_type application/json;
return 200 '{"code": 0, data:"soulteary"}';
}
Copy the code
Then use the Nginx map directive and Nginx conditional judgment to filter the logging of non-POST requests and reject processing of non-POST requests.
map $request_method $loggable { default 0; POST 1; }... server { location / { if ( $request_method ! ~ ^POST$ ) { return 405; } access_log /var/log/nginx/access.log main if=$loggable; Proxy_pass http://127.0.0.1/internal-api-path; }... }Copy the code
Using the curl request again, you can see that the logs are parsed correctly.
192.168.224.1 - [31 Oct / 2020:15:19:59 + 0000] "POST/HTTP / 1.1" 200 "" "curl / 7.64.1 "29" "{\" key1 \ ": \" value1 \ ", \"key2\":\"value2\"}Copy the code
At the same time, no non-POST requests will be recorded. When a POST request is used, the 405 error status will be displayed.
At this point, you might be wondering, why doesn’t 405 get redirected to 200 as opposed to the previous one? This is because the 405 is “set manually” based on the trigger condition, rather than the Nginx logic running to determine the new result.
The current Nginx configuration is as follows:
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main escape=json '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" $request_body';
sendfile on;
keepalive_timeout 65;
map $request_method $loggable {
default 0;
POST 1;
}
server {
listen 80;
server_name localhost;
charset utf-8;
location / {
if ( $request_method !~ ^POST$ ) { return 405; }
access_log /var/log/nginx/access.log main if=$loggable;
proxy_pass http://127.0.0.1/internal-api-path;
}
location /internal-api-path {
access_log off;
default_type application/json;
return 200 '{"code": 0, "data":"soulteary"}';
}
error_page 405 =200 $uri;
}
}
Copy the code
But does it really end there?
Simulate common cross-domain requests from front-end clients
We open the familiar “Baidu” and enter the following code in the console to simulate a common business cross-domain request.
async function testCorsPost(url = '', data = {}) {
const response = await fetch(url, {
method: 'POST',
mode: 'cors',
cache: 'no-cache',
credentials: 'same-origin',
headers: { 'Content-Type': 'application/json' },
redirect: 'follow',
referrerPolicy: 'no-referrer',
body: JSON.stringify(data)
});
return response.json();
}
testCorsPost('http://localhost:3000', { hello: "soulteary" }).then(data => console.log(data));
Copy the code
After executing the code, you should see a classic prompt:
Access to fetch at 'http://localhost:3000/' from origin 'https://www.baidu.com' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
POST http://localhost:3000/ net::ERR_FAILED
Copy the code
If you look at the Network panel, you will see two new requests that failed:
- Request URL: http://localhost:3000/
- Request Method: OPTIONS
- Status Code: 405 Not Allowed
- Request URL: http://localhost:3000/
- Request Method: POST
- No response result
Let’s go ahead and tweak the configuration to solve this common problem.
Use Nginx to solve front-end cross-domain problems
We first adjust the filtering rules to allow OPTIONS requests to be processed.
if ( $request_method ! ~ ^(POST|OPTIONS)$ ) { return 405; }Copy the code
Cross-domain request is a common scenario front end, many people will be lazy use “*” to solve the problem, but modern browsers such as Chrome, some of the scenes in the new version can’t use such lax rules, and for the sake of business security, generally, we will be on the server Settings allow cross-domain request domain white list, refer to the above way, We can easily define a Nginx map configuration like the following to deny all front-end unauthorized cross-domain requests:
map $http_origin $corsHost { default 0; "~(.*).soulteary.com" 1; "~(.*).baidu.com" 1; } server { ... location / { ... if ( $corsHost = 0 ) { return 405; }... }}Copy the code
There is a trick here. The rules written in Nginx routing are not exactly the same as those written in level programming languages, which can be executed sequentially and have the “priority/coverage” relationship. Therefore, in order for the front-end to call the interface normally for data submission, rules need to be written in this way, with four lines of code redundancy.
if ( $corsHost = 0 ) { return 405; } if ($corsHost = 1) {# Cookie add_header 'access-control-allow-credentials' 'false'; add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requ ested-With,Date,Pragma'; add_header 'Access-Control-Allow-Methods' 'POST,OPTIONS'; add_header 'Access-Control-Allow-Origin' '$http_origin'; } # OPTION request returns 204 with the BODY response removed. $request_method = 'OPTIONS') {add_header 'access-control-allow-credentials' 'false'; if ($request_method = 'OPTIONS') {add_header' access-control-allow-credentials' 'false'; add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requ ested-With,Date,Pragma'; add_header 'Access-Control-Allow-Methods' 'POST,OPTIONS'; add_header 'Access-Control-Allow-Origin' '$http_origin'; add_header 'Access-Control-Max-Age' 1728000; add_header 'Content-Type' 'text/plain charset=UTF-8'; add_header 'Content-Length' 0; return 204; }Copy the code
If we execute the previous JavaScript code in the web page again, we will see that the request is ready to execute and the front-end data will return:
{code: 0, data: "soulteary"}
Copy the code
The Nginx log, however, will have an additional record as expected:
172.20.0.1 - [31/Oct/2020:15:49:17 +0000] "POST/HTTP/1.1" Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/86.0.4240.111 Safari/537.36" "" {\"hello\ :\"soulteary\"}Copy the code
And use the curl command before implementation, continue to simulate pure interface call, will be found that there were 405 error response, it is because we request does not include the origin in the head, can’t show our source of identity, in the request using the H parameter completion this data, you can get in line with expectations of return:
curl -d '{"key1":"value1", "key2":"value2"}' -H "Content-Type: application/json" -H "origin:www.baidu.com" -X POST http://localhost:3000/
{"code": 0, "data":"soulteary"}
Copy the code
Relatively complete Nginx configuration
Up to now, we have basically implemented the general collection function, meet the basic requirements of Nginx configuration information as follows:
user nginx; worker_processes auto; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main escape=json '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for" $request_body'; sendfile on; keepalive_timeout 65; map $request_method $loggable { default 0; POST 1; } map $http_origin $corsHost { default 0; "~(.*).soulteary.com" 1; "~(.*).baidu.com" 1; } server { listen 80; server_name localhost; charset utf-8; location / { if ( $request_method ! ~ ^(POST|OPTIONS)$ ) { return 405; } access_log /var/log/nginx/access.log main if=$loggable; if ( $corsHost = 0 ) { return 405; } if ($corsHost = 1) {# Cookie add_header 'access-control-allow-credentials' 'false'; add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requ ested-With,Date,Pragma'; add_header 'Access-Control-Allow-Methods' 'POST,OPTIONS'; add_header 'Access-Control-Allow-Origin' '$http_origin'; } # OPTION request returns 204 with the BODY response removed. $request_method = 'OPTIONS') {add_header 'access-control-allow-credentials' 'false'; if ($request_method = 'OPTIONS') {add_header' access-control-allow-credentials' 'false'; add_header 'Access-Control-Allow-Headers' 'Accept,Authorization,Cache-Control,Content-Type,DNT,If-Modified-Since,Keep-Alive,Origin,User-Agent,X-Mx-ReqToken,X-Requ ested-With,Date,Pragma'; add_header 'Access-Control-Allow-Methods' 'POST,OPTIONS'; add_header 'Access-Control-Allow-Origin' '$http_origin'; add_header 'Access-Control-Max-Age' 1728000; add_header 'Content-Type' 'text/plain charset=UTF-8'; add_header 'Content-Length' 0; return 204; } proxy_pass http://127.0.0.1/internal-api-path; } location /internal-api-path { access_log off; default_type application/json; return 200 '{"code": 0, "data":"soulteary"}'; } error_page 405 =200 $uri; }}Copy the code
If we use it in combination with a container, we can achieve a simple and stable collection service by simply adding an additional route definition to it for health checks alone. Continue to connect with subsequent data transfer and processing procedures.
location /health {
access_log off;
return 200;
}
Copy the code
The compose configuration file, however, has just a few more lines of health check definitions:
Version: "3" Services: NGX: Image: nginx:1.19.3- Alpine Restart: Always ports: -3000 :80 Volumes: - /etc/localtime:/etc/localtime:ro - /etc/timezone:/etc/timezone:ro - ./nginx.conf:/etc/nginx/nginx.conf healthcheck: test: wget --spider localhost/health || exit 1 interval: 5s timeout: 10s retries: 3Copy the code
With Traefik, instances can be easily scaled horizontally to handle more requests. Check out my previous posts if you’re interested.
The last
This article has only scratched the surface of data acquisition, and more may be covered later. I’m going to pay for my kid’s cat food, so that’s it.
–EOF
I now have a small toss group, which gathered some like to toss small partners.
In the case of no advertisement, we will talk about software, HomeLab and some programming problems together, and also share some technical salon information in the group from time to time.
Like to toss small partners welcome to scan code to add friends. (Please specify source and purpose, otherwise it will not be approved)
All this stuff about getting into groups
This article is published under a SIGNATURE 4.0 International (CC BY 4.0) license. Signature 4.0 International (CC BY 4.0)
Author: Su Yang
Creation time: on November 1 2020 statistical word count: 12976 words reading time: 26 minutes to read this article links: soulteary.com/2020/11/01/…