During a week in the company, we sorted out our own log scheme
demand
- It is divided into Intranet and extranet
- Logs are classified into real-time logs and non-real-time logs
- Real-time logs are transmitted to the Intranet through the message queue channel and stored in ES after processing
- Non-real-time logs are uploaded to Tencent cloud OSS and synchronized to the Intranet for subsequent log analysis. After processing, they are stored in ES and CEPH object storage on the Intranet
Technology selection
Ubuntu18.04, other systems with rsyslog8.x will do, you can also install a
Utilize system service: Rsyslog 8.4.20
Logstash 7.13.0
Elasticsearch 7.8.0
The log format is used
HOSTNAME PROGRAME JSON The value is a character stringCopy the code
The reason for using JSON strings is that it is definitely easier for subsequent programs to process them, although it adds some space.
The working process
Program write Rsyslog -> RSyslog forward to Logstash -> Logstash Shunt real-time and non-real-time logs to message queues and object storage
A simple log format
Data yq-data: {"type": "off-line", "time": "2021-06-23T07:03:55.122584z "," MSG ": "I am a log message", "level": "DEBUG"}Copy the code
configurationrsyslog
vim /etc/rsyslog.conf
Add the following configuration
local7.* @@logstash-host:5514
Copy the code
Logstash -host:5514 is the IP address and port of logstash to start syslog. Local7 is the facility of Rsyslog
Add an rsyslog configuration under /etc/rsyslog.d
Vim /etc/rsyslog.d/20-default.conf, enter the following information
template(name="DynaFile" type="string" string="/var/log/%programname%/%programname%.log")
local7.* action(type="omfile" dynaFile="DynaFile")
Copy the code
After the configuration, restart systemctl restart rsyslog
PROGRAME = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log = /var/log
Why do you do that? The reason is not only to send the log to the Logstash, but also to store a copy locally. After all, there is no guarantee that the Logstash will always work 100%. When it makes a mistake, we can extract it again through FileBeats or directly throw it to our log program to deal with it
Take Python Logging as an example
SysLogHandler(facility= sysloghandler. LOG_LOCAL7, address=("172.16.102.227", 514))Copy the code
To configurelogrotate
As the name suggests, it is log rotation, we configure daily rotation, save logs for 30 days, check the logrotate documentation for details
vim /etc/logrotate/custom-log
/var/log/yq*/*.log
{
rotate 30
daily
missingok
notifempty
delaycompress
nocompress
postrotate
systemctl kill -s HUP rsyslog.service
endscript
}
Copy the code
Why do we use yq star here, because there are so many items, so we have a prefix for all items, and of course you can do it yourself
configurationlogstash
vim /etc/logstash/conf.d/my.conf
input { syslog { port => 5514 } } filter { json { source => "message" } prune { whitelist_names => [ "msg", "logsource", "program", "time", "level", "type" ] } mutate { remove_field => ["@timestamp", If [type] == "off-line" {elasticSearch {hosts => [" es-host "] index => "my-log" If [type] == "off-line" {s3 {endpoint => "https://cos.ap-nanjing.myqcloud.com" Access_key_id => "XXXX" secret_access_key => "XXXX" region => "AP-Nanjing" bucket => "XXXX" # 10 minute time_file => 10 COdec => "json_lines" canned_acl => "public-read" } } if [type] == "real-time" { rabbitmq { exchange => "my-exchange" host => "localhost" exchange_type => "direct" key => "logstash" user => "admin" password => "admin" } } stdout {} }Copy the code
Debug mode/user/share/logstash/bin/logstash -f/etc/logstash/conf. D/my. Conf
Remove stdout {} from production
systemctl restart logstash
Docker migration scheme
Docker – compose, for example
Logging: driver: "syslog" options: syslog-address: "udp://127.0.0.1:514" tag: yq-service-manager syslog-facility: local7Copy the code
The subsequent
You can query or analyze logs from ES… It can also be viewed visually from Kibana
Some of them want to talk
- The LogStash node does not have to be deployed on every server. On average, multiple logStash nodes correspond to one logstash node
- The rsyslog service does not use the remote host mode. In a high concurrency log writing environment, remote syslog does not handle it well and may even discard some data
- You can use Flink or Spark to process offline logs.
I have used Flink, which deploys a cluster of 10 slots in a 4-core 8G server with an average processing speed of 1K pieces of data per second. This speed can be increased as the number of slots increases, regardless of ES performance.