Real-time collection and analysis of Nginx logs, automatic blocking of risky IP solutions

The article addresses: blog.piaoruiqing.com/2019/11/17/…

preface

This article shares the scheme and practice of automatic collection and analysis of Nginx logs and real-time blocking of risky IP addresses.

Read this article and you will learn:

  • Log collection scheme.
  • A simple solution for risk IP assessment.
  • IP address blocking policies and solutions.

To read this article you need:

  • Familiar with programming.
  • Familiar with common Linux commands.
  • Understand the Docker.

background

When analyzing the nginx access log, I saw a large number of invalid 404 requests, urls with random sensitive words. And recently these requests have become more frequent, manual batch ban some IP, soon after the new IP.

Therefore, I came up with the idea of using automatic analysis of Nginx logs to block IP in real time.

demand

The serial number demand note
1 Nginx log collection There are many schemes, the author chose the most suitable scheme for personal server:filebeat+redis
2 Real-time Log Analysis Real time consumptionredisTo parse out the required data for analysis
3 IP Risk Assessment Risk assessment of IP addresses from multiple dimensions: access times, IP address ownership, and usage
4 Real-time banned For different lengths of risk IP blocking

Analysis of the

A few features can be summarized from the logs:

The serial number Characteristics of the describe note
1 Visit frequently Several or even dozens of times per second Normal traffic behavior also has a burst of traffic, but it does not last long
2 Last request Last a long time Same as above
3 Most of the 404 Most of the requested urls may not exist and contain sensitive terms such as admin, login, phpMyAdmin, backup, etc This is rarely the case with normal traffic behavior
4 IP is not normal As you can see from ASN, the REQUESTED IP addresses are not ordinary individual users. Queries are generally used for COM(business), DCH(data center/Web hosting/transmission), SES(search engine spider), etc

Note: the analysis of IP here is the free version of the database through ip2Location, which will be described in detail later.

plan

Log collection

Source: The author’s website is deployed through Docker, with Nginx as the only entry point, logging all visits.

Collection: Due to limited resources, the author chose Filebeat, a lightweight log collection tool, to collect Nginx logs and write them into Redis.

The risk assessment

The Monitor service evaluates risks based on URLS, IP addresses, and historical scores to calculate the final risk factor.

IP banned

When the Monitor detects a dangerous IP address (the danger factor exceeds the threshold), it uses the Actuator to ban the IP address. The duration of the ban is calculated based on the danger factor.

The implementation of

Log collection

Filebeat can be deployed by swarm as follows (other services are omitted to prevent the code from being too long):

version: '3.5'
services:
  filebeat:
    image: Docker. Elastic. Co/beats/filebeat: 7.4.2
    deploy:
      resources:
        limits:
          memory: 64M
      restart_policy:
        condition: on-failure
    volumes:
    - $PWD/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
    - $PWD/filebeat/data:/usr/share/filebeat/data:rw
    - $PWD/nginx/logs:/logs/nginx:ro
    environment:
      TZ: Asia/Shanghai
    depends_on:
    - nginx
Copy the code
  • Image: specifies the image and version.
  • The deploy. Resources. Limits. The memory: memory restrictions.
  • $PWD/filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro: filebeat.yml is a configuration file that describes the log source and destination.$PWDIs the current directory, that is, executedocker stack deployThe directory.roIs the read-only permission.
  • $PWD/filebeat/data:/usr/share/filebeat/data:rw: The data directory needs to be persisted, so that deleting the docker and redeploying will record the location of the last read log.rwThe value is read and write permission.
  • $PWD/nginx/logs/logs/nginx: ro: will be mapped to Filebeat nginx log directory.
  • Environment. TZ: time zone

The contents of filebeat.yml are as follows:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /logs/nginx/access.log
  json.keys_under_root: true
  json.overwrite_keys: true

output.redis:
  hosts: ["redis-server"]
  password: "{your redis password}"
  key: "filebeat:nginx:accesslog"
  db: 0
  timeout: 5
Copy the code
  • Filebeat. inputs: Defines inputs

  • Paths: log path

  • Json. keys_under_root: Puts the log content in the root node of the JSON (if not set, the entire data will be placed under a secondary node). Note: I configured nginx logs in JSON format. The reference configuration is as follows:

    log_format  main_json  escape=json
    '{'
       '"@timestamp":"$time_iso8601",'
       '"http_host":"$http_host",'
       '"remote_addr":"$remote_addr",'
       '"request_uri":"$request_uri",'
       '"request_method":"$request_method",'
       '"server_protocol":"$server_protocol",'
       '"status":$status,'
       '"request_time":"$request_time",'
       '"body_bytes_sent":$body_bytes_sent,'
       '"http_referer":"$http_referer",'
       '"http_user_agent":"$http_user_agent",'
       '"http_x_forwarded_for":"$http_x_forwarded_for"'
    '} ';
    Copy the code
  • Json. overwrite_keys: Overwrites the KEY generated by Filebeat, in order to override the @TIMESTAMP field.

  • Output. redis: defines the output.

View redis data after successful deployment:

[Copyright Notice]


This article was published on
Park Seo-kyung’s blog, allow non-commercial reprint, but reprint must retain the original author
PiaoRuiQingAnd links:
blog.piaoruiqing.comFor negotiation or cooperation on authorization, please contact:
[email protected].

The risk assessment

The Monitor service is written in Java, deployed in Docker, and interacts with the Actuator service through HTTP.

Risk assessment needs to integrate multiple dimensions:

The serial number The dimension strategy
1 IP belongs to The user groups of Chinese websites generally belong to China, so it is necessary to be cautious if the IP address belongs to foreign countries.
2 use Using IP to get its purpose, DCH(data center/Web hosting/transmission), SES(search engine spider) and so on increase risk score.
3 Access to the resource Access resources do not exist and the path contains sensitive words, such as admin, login, phpMyAdmin, and backup.
4 Frequency and duration of visits Frequent and persistent requests, consider raising the score.
5 History score The historical score is integrated into the current score.

Obtain the home address of the IP address

It is easy to obtain IP address, and many data service websites offer free packages, such as IpInfo. There are also free IP databases that can be downloaded such as IP2Location.

The author used the free database of IP2Location:

Ip_from and ip_to are the start and end of IP segments and are stored in decimal format. MySQL can convert IP addresses to decimal using the inet_aton(‘your IP ‘) function. Such as:

set @a:= inet_aton('172.217.6.78');
SELECT * FROM ip2location_db11 WHERE ip_from <= @a AND ip_to >= @a LIMIT 1;
Copy the code
ip_from ip_to country_code country_name region_name city_name latitude longitude zip_code time_zone
2899902464 2899910655 US United States California Mountain View 37.405992 122.07852 94043 – 07:00
  • A large amount of data is recommendedLIMIT 1.

Obtain AS, ASN and their usage

Most websites offer free services that do not have access to ASN or have no use for it. There are also free databases for ASN data, but there is still no usage and type of ASN data. At this time the author through other methods curve to save the country.

Ip2location ™LITE ip2Location ™LITE Database ip2Location ™LITE database IP2Proxy™LITE database ip2Location ™LITE database ip2Location ™LITE database

IP2Location™LITE IP-ASN: The database provides a reference for determining autonomous systems and Numbers (ASN).

IP2Proxy™LITE: The database contains IP addresses that are used as open proxies. The database includes proxy types, countries, regions, cities, ISPs, domains, usage types, ASN, and latest records for all public IPv4 and IPv6 addresses.

The IP address type cannot be queried in IP2Location™LITE IP-ASN, and the specified IP address may not be included in IP2Proxy™LITE data. But you can combine the two libraries to make a rough guess at what IP is for:

  1. First, the ASN of the IP is queried in IP2Proxy™LITE.

    set @a:= inet_aton('172.217.6.78');
    SELECT * FROM ip2location_asn WHERE ip_from <= @a AND ip_to >= @a LIMIT 1;
    Copy the code
    ip_from ip_to cidr asn as
    2899904000 2899904255 172.217.6.0/24 15169 Google LLC
  2. Query the two records closest to the specified IP address of the same ASN by combining ASN and IP:

    set @a:= inet_aton('172.217.6.78');
    SELECT * FROM ip2proxy_px8 WHERE ip_from >= @a AND asn = 15169 ORDER BY ip_from ASC LIMIT 1;
    SELECT * FROM ip2proxy_px8 WHERE ip_from <= @a AND asn = 15169 ORDER BY ip_from DESC LIMIT 1;
    Copy the code
    ip_from ip_to proxy_type country_code country_name region_name city_name isp domain usage_type asn as last_seen
    2899904131 2899904131 PUB US United States California Mountain View Google LLC google.com DCH 15169 Google LLC 30
    ip_from ip_to proxy_type country_code country_name region_name city_name isp domain usage_type asn as last_seen
    2899904015 2899904015 PUB US United States California Mountain View Google LLC google.com DCH 15169 Google LLC 30
  3. Calculate the absolute value of the difference between the IP address in the queried proxy record and the current IP address.

    IP proxy IP abs(IP – proxy IP)
    2899904078 2899904131 53
    2899904078 2899904015 63

    If the absolute values are close, the IP is considered to be used for the same purpose as a proxy IP. Very close definitions can be adjusted to suit the situation, such as absolute values in the 65535 range.

The comprehensive score

The comprehensive scoring rules can be adjusted based on actual scenarios

The serial number categories Scoring Rules (1-10 points)
1 IP belongs to Such as: domestic 5 points, 10 points abroad, can be subdivided according to the region
2 use For example, ISP/MOB counts 2 points, COM counts 5 points, and DCH counts 10 points
3 Access to the resource For example: 404 is 5 points, the existence of sensitive words are 10 points
4 Frequency and duration of visits Scores are calculated based on the average number of visits over time
5 History score

The above 1-5 items are combined to calculate, which can be simply added up or weighted.

IP banned

The author uses **iptables+ipset** to block IP. The Actuator service is written using Node and runs on the host. The Monitor in docker interacts with it through HTTP. Part of the banned IP code is as follows:

'use strict';

const express = require('express');
const shell = require('shelljs');
const router = express.Router();

router.post('/blacklist/:name/:ip'.function (req, res, next) {
    let name = req.params.name;
    let ip = req.params.ip;
    let timeout = req.query.timeout;
    let cmd = `ipset -exist add ${name} ${ip} timeout ${timeout}`;
    console.log(cmd);
    shell.exec(cmd);
    res.send('ok\n');
});

module.exports = router;
Copy the code
  • Name: indicates the blacklist name.
  • Timeout: indicates the timeout period, in seconds.

At present, there are still many “head iron” IP frequently scan the author’s website, in a few seconds after the discovery of automatic shielding, the current effect is relatively ideal.

conclusion

  • Crawler, robot, vulnerability scan and so on have caused unnecessary overhead and even brought risks to the website, which cannot be ignored. It’s hard to be absolutely safe, but try to be safer than others.
  • Ban is a relatively violent means, we must grasp the scale, there will lead to the loss of website users.
  • In addition to blocking, you can also consider “disaster diversion “, redirect the risk IP 302 to gitPage (backup site), so that even if accidentally killed, users will not be unable to access the situation.
  • On countless roads ahead, safety comes first.

If this article is helpful to you, please give a thumbs up (~ ▽ ~)”

Recommended reading

  • Repeatedly suit! How to migrate data
  • How to Learn to program
  • Open API Gateway Practice # 1 — Design an API gateway
  • Open API Gateway Practice ii – Replay attack and Defense
  • Open API Gateway Practice iii – Limiting traffic
  • Build K8S from scratch with official documentation
  • Kubernetes(2) Application deployment
  • How do I access the service from outside

Welcome to pay attention to the public account (code such as poem)

[Copyright Notice]


This article was published on
Park Seo-kyung’s blog, allow non-commercial reprint, but reprint must retain the original author
PiaoRuiQingAnd links:
blog.piaoruiqing.comFor negotiation or cooperation on authorization, please contact:
[email protected].