During a week in the company, we sorted out our own log scheme

demand

  1. It is divided into Intranet and extranet
  2. Logs are classified into real-time logs and non-real-time logs
  3. Real-time logs are transmitted to the Intranet through the message queue channel and stored in ES after processing
  4. Non-real-time logs are uploaded to Tencent cloud OSS and synchronized to the Intranet for subsequent log analysis. After processing, they are stored in ES and CEPH object storage on the Intranet

Technology selection

Ubuntu18.04, other systems with rsyslog8.x will do, you can also install a

Utilize system service: Rsyslog 8.4.20

Logstash 7.13.0

Elasticsearch 7.8.0

The log format is used

HOSTNAME PROGRAME JSON The value is a character stringCopy the code

The reason for using JSON strings is that it is definitely easier for subsequent programs to process them, although it adds some space.

The working process

Program write Rsyslog -> RSyslog forward to Logstash -> Logstash Shunt real-time and non-real-time logs to message queues and object storage

A simple log format

Data yq-data: {"type": "off-line", "time": "2021-06-23T07:03:55.122584z "," MSG ": "I am a log message", "level": "DEBUG"}Copy the code

configurationrsyslog

vim /etc/rsyslog.conf

Add the following configuration

local7.* @@logstash-host:5514
Copy the code

Logstash -host:5514 is the IP address and port of logstash to start syslog. Local7 is the facility of Rsyslog

Add an rsyslog configuration under /etc/rsyslog.d

Vim /etc/rsyslog.d/20-default.conf, enter the following information

template(name="DynaFile" type="string" string="/var/log/%programname%/%programname%.log")

local7.* action(type="omfile" dynaFile="DynaFile")
Copy the code

After the configuration, restart systemctl restart rsyslog

PROGRAME = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log

Why do you do that? The reason is not only to send the log to the Logstash, but also to store a copy locally. After all, there is no guarantee that the Logstash will always work 100%. When it makes a mistake, we can extract it again through FileBeats or directly throw it to our log program to deal with it

Take Python Logging as an example

SysLogHandler(facility= sysloghandler. LOG_LOCAL7, address=("172.16.102.227", 514))Copy the code

To configurelogrotate

As the name suggests, it is log rotation, we configure daily rotation, save logs for 30 days, check the logrotate documentation for details

vim /etc/logrotate/custom-log

/var/log/yq*/*.log
{
    rotate 30
    daily
    missingok
    notifempty
    delaycompress
    nocompress
    postrotate
        systemctl kill -s HUP rsyslog.service
    endscript
}
Copy the code

Why do we use yq star here, because there are so many items, so we have a prefix for all items, and of course you can do it yourself

configurationlogstash

vim /etc/logstash/conf.d/my.conf

input { syslog { port => 5514 } } filter { json { source => "message" } prune { whitelist_names => [ "msg", "logsource", "program", "time", "level", "type" ] } mutate { remove_field => ["@timestamp", If [type] == "off-line" {elasticSearch {hosts => [" es-host "] index => "my-log" If [type] == "off-line" {s3 {endpoint => "https://cos.ap-nanjing.myqcloud.com" Access_key_id => "XXXX" secret_access_key => "XXXX" region => "AP-Nanjing" bucket => "XXXX" # 10 minute time_file => 10 COdec  => "json_lines" canned_acl => "public-read" } } if [type] == "real-time" { rabbitmq { exchange => "my-exchange" host =>  "localhost" exchange_type => "direct" key => "logstash" user => "admin" password => "admin" } } stdout {} }Copy the code

Debug mode/user/share/logstash/bin/logstash -f/etc/logstash/conf. D/my. Conf

Remove stdout {} from production

systemctl restart logstash

Docker migration scheme

Docker – compose, for example

Logging: driver: "syslog" options: syslog-address: "udp://127.0.0.1:514" tag: yq-service-manager syslog-facility: local7Copy the code

The subsequent

You can query or analyze logs from ES… It can also be viewed visually from Kibana

Some of them want to talk

  • The LogStash node does not have to be deployed on every server. On average, multiple logStash nodes correspond to one logstash node
  • The rsyslog service does not use the remote host mode. In a high concurrency log writing environment, remote syslog does not handle it well and may even discard some data
  • You can use Flink or Spark to process offline logs.

I have used Flink, which deploys a cluster of 10 slots in a 4-core 8G server with an average processing speed of 1K pieces of data per second. This speed can be increased as the number of slots increases, regardless of ES performance.