1 background
With the increasing of customer, customer business complex degree increasing, the traditional server level monitoring due to monitoring particle size is too large, and need to be further explored the reason, when the alarm has been unable to meet demand, in order to further customer business, ensure healthy business running, we need to collect log server system, customers, business log, analysis and processing, Do fault occurs can locate the cause of the problem in the first time, notify the relevant personnel processing, so how to set the log file collection, how to set the log file, and immediately inform fault occurs to the appropriate personnel responsible for the business, became a problem many companies need to face, so log monitoring system arises at the historic moment.
2 Log monitoring system architecture design
2.1 Architecture Composition
2.2 Architecture Strategy
Configure the log collection client on the data source side to collect original logs and aggregate them to MQ. MQ selects Kafka for log message caching and distribution, deploys LogStash on the back end, subscribes to log messages in Kafka topic, and writes them to ES file storage.
ElastAlert determines whether log messages contain errors or exceptions and sends them to related contacts by email or SMS. ES provides data to Grafana/Kibana for data display. Users can query real-time logs through a Web interface
3 Log monitoring system introduction
3.1 Data Collection
The data collection layer is located on the service server cluster. A data collection tool is installed on each service server to collect logs, including application logs, system logs, and access logs. The collected logs are then sent to the message queue cluster.
The selection of data acquisition tool (Agent) should consider:
- Deployment way
- Deployment difficulty
- Level of Service intrusion
- Resource consumption
At present, the mainstream open source log collection tools are: Logstash, Filebeat, Fluentd and so on
Logstash
Logstash is an open source data collection engine with real-time plumbing capabilities. Logstash dynamically consolidates data from different data sources and standardizes the data to a destination of your choice
-
The advantage is that it is flexible, with many plug-ins, detailed documentation, and straightforward configuration formats that allow it to be used in a variety of scenarios. Besides, the whole TECHNOLOGY stack of ELK is widely used in many companies, so you can find a lot of related learning resources on the website
-
The disadvantage is its performance and resource consumption. Although its performance has improved greatly in recent years, it is still much slower than its replacements, and it can be a problem in the case of large data volumes. Another problem is that it does not currently support caching.
Filebeat
Filebeat is a lightweight log transfer tool that makes up for the shortcoming of Logstash. As a lightweight log transfer tool, Filebeat can push logs to Kafka, Logstash, ElasticSearch, and Redis
- The advantage is that it is only a binary file without any dependencies. It takes very few resources, and although it is still very young, there is very little that can go wrong with it because of its simplicity, so its reliability is very high.
- The disadvantage is that
Filebeat
The range of applications is very limited, so we have problems in certain scenarios. For example, if usingLogstash
As a downstream pipeline, we also encounter performance issues. And because of thatFilebeat
The scope of… is expanding. At first, it can only send logs toLogstash
andElasticsearch
And now it can send logs toKafka
andRedis
.
Fluentd
Fluentd was created primarily to use JSON as the log output whenever possible, so the transport tool and its downstream transmission line do not have to guess the types of fields in the substring. This provides libraries for almost any language, which means you can plug it into custom programs.
- Advantage is
Fluentd
Plug-in is to useRuby
Language development is very easy to write and maintain. So there are a lot of them, and almost all source and target stores have plug-ins. It also means it worksFluentd
To connect everything together. - The disadvantage is that
Fluentd
Flexibility is not good. However, unstructured data can still be parsed using regular expressions. Although performance is good in most scenarios, it is not the best, and its buffering only exists with the output side, the single-threaded core as wellRuby GIL
The plug-in implementation means that performance is limited under its large nodes.
To sum up, Filebeat takes up few resources, is lightweight, has high reliability and is easy to deploy. Therefore, Filebeat is used as a tool at the data acquisition layer
3.2 Message Queue
As the business scale expands, the daily quality keeps increasing, and the product line connected to the log service keeps increasing, the performance of writing to ES will be reduced when the traffic peak occurs, and the CPU will be full, causing the risk of cluster downtime at any time. Therefore, it is necessary to introduce message queue for peak cutting and valley filling. After raw logs are sent to the Kafka+Zookeeper cluster, they are stored in a centralized manner. In this case, FileBeat is the message producer and the stored messages can be consumed at any time.
3.3 Data Analysis
As a consumer, LogStash pulls original logs from the Kafka+ ZooKeeper cluster node in real time, analyzes, cleans, and filters the obtained original logs according to rules, and forwards the cleaned logs to the Elasticsearch cluster.
3.4 Persistent Data Storage
After receiving the Logstash data, the Elasticsearch cluster writes disks, creates an index library, and stores structured data to the Elasticsearch cluster
3.5 Data Query display
Kibana/Grafana is a visual data presentation platform that reads data from the Elasticsearch cluster when requested, and then performs visualization and multidimensional analysis
4 Installation and Deployment
4.1 Environment Preparations
The server | Run the software | Deployment way |
---|---|---|
10.1.1.11 | Nginx + Filebeat | binary |
10.1.1.12 | Kafka + Zookeeper | docker-compose |
10.1.1.13 | ES + Logstash | Docker-compose + binary |
10.1.1.14 | Kibana + Grafana | binary |
4.2 Docker Environment Deployment
Deployed on 10.1.1.12 and 10.1.1.13
2 install the docker
[root@kaf_zoo ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@kaf_zoo ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@kaf_zoo ~]# yum -y install docker-ce
docker -v
[root@kaf_zoo ~]# docker -v
Docker version 20.10.6, build 370c289
Copy the code
4.2.2 Configuring the accelerator
[root@kaf_zoo ~]# sudo mkdir -p /etc/docker
[root@kaf_zoo ~]# sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://su9ppkb0.mirror.aliyuncs.com"]
}
EOF
[root@kaf_zoo ~]# systemctl daemon-reload
[root@kaf_zoo ~]# systemctl start docker
[root@kaf_zoo ~]# systemctl enable docker
Copy the code
Install the docker – compose holdings
[root@kaf_zoo ~]# yum install docker-compose -y
[root@kaf_zoo ~]# docker-compose -v
docker-compose version 1.18.0, build 8dd22a9
Copy the code
4.3 Deploying an ES Cluster
4.3.1 Environment Configuration
#Optimized kernel for ES support
[root@es_logst es]# echo 'vm.max_map_count=262144' >> /etc/sysctl.conf
[root@es_logst es]# sysctl -p
#Configuration variables[root@es_logst es]# echo 'ELK_VERSION=7.5.1' >.env
#Enabling IPv4 Forwarding
[root@es_logst es]# echo "net.ipv4.ip_forward = 1" /usr/lib/sysctl.d/00-system.conf
[root@es_logst es]# systemctl restart network
[root@es_logst es]# systemctl restart docker
Copy the code
4.3.2 Preparing a Directory
Mkdir /data/es/data-es{1,2,3} mkdir /data/es/config mkdir /data/es/elasticsearchCopy the code
4.3.3 Preparing the Configuration File
cat /data/es/docker-compose.yml
version: '3.3'
services:
es01:
build:
context: elasticsearch/
args:
ELK_VERSION: $ELK_VERSION
container_name: es01
volumes:
- type: bind
source: /data/es/config/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- type: volume
source: data-es1
target: /usr/share/elasticsearch/data
ports:
- 9200: 9200
- 9300: 9300
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: - 1
hard: - 1
networks:
- elastic
es02:
build:
context: elasticsearch/
args:
ELK_VERSION: $ELK_VERSION
container_name: es02
volumes:
- type: bind
source: /data/es/config/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- type: volume
source: data-es2
target: /usr/share/elasticsearch/data
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: - 1
hard: - 1
networks:
- elastic
es03:
build:
context: elasticsearch/
args:
ELK_VERSION: $ELK_VERSION
container_name: es03
volumes:
- type: bind
source: /data/es/config/elasticsearch.yml
target: /usr/share/elasticsearch/config/elasticsearch.yml
read_only: true
- type: volume
source: data-es3
target: /usr/share/elasticsearch/data
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: - 1
hard: - 1
networks:
- elastic
volumes:
data-es1:
driver: local
driver_opts:
type: none
o: bind
device: /data/es/data-es1
data-es2:
driver: local
driver_opts:
type: none
o: bind
device: /data/es/data-es2
data-es3:
driver: local
driver_opts:
type: none
o: bind
device: /data/es/data-es3
networks:
elastic:
driver: bridge
Copy the code
/data/es/elasticsearch/Dockerfile
ARG ELK_VERSION = 7.5.1
# https://github.com/elastic/elasticsearch-docker
# FROM docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}
FROM elasticsearch:${ELK_VERSION}
# Add your elasticsearch plugins setup here
# Example: RUN elasticsearch-plugin install analysis-icu
Copy the code
/data/es/config/elasticsearch.yml
cluster.name: "es-docker-cluster"
#This configuration item is used to set the IP address of ElasticSearch. The default value is 0.0.0.0Network. The host: 0.0.0.0Copy the code
Directory overview
[root@es_logst data]# pwd
/data
[root@es_logst data]# tree
.
`-- es
|-- config
| `-- elasticsearch.yml
|-- data-es1
|-- data-es2
|-- data-es3
|-- docker-compose.yml
`-- elasticsearch
`-- Dockerfile
6 directories, 3 files
Copy the code
4.3.4 Starting the ES Cluster
[root@es_logst es]# docker-compose up -d Starting es02 ... Starting es03 ... Starting es01 ... done [root@es_logst es]# docker-compose ps Name Command State Ports -------------------------------------------------------------------------------------------------- es01 /usr/local/bin/docker-entr ... Up 0.0.0.0:9200->9200/ TCP,:::9200->9200/ TCP, 9300/ TCP es02 /usr/local/bin/docker-entr... Up 9200/tcp, 9300/tcp es03 /usr/local/bin/docker-entr ... Up 9200/tcp, 9300/tcpCopy the code
[root@es_logst es]# curl 10.1.1.13:9200 {"name" : "es01", "cluster_name" : "es-docker-cluster", "cluster_uuid" : "P5FnRclnSBCkO_wPAMJPow", "version" : {"number" : "7.5.1", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96", "build_date" : "2019-12-16t22:57:37.835892z ", "build_snapshot" : false, "lucene_version" : "Minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1"}, "tagline" : "You Know, for Search"}Copy the code
The ES cluster is deployed.
4.4 deployment Kibana
4.4.1 installation Kibana
[root@kibana_gra ~]# mkdir /data/kibana && cd /data/kibana
[root@kibana_gra kibana]# wget https://artifacts.elastic.co/downloads/kibana/kibana-7.5.1-x86_64.rpm
Copy the code
#The installation[root@kibana_gra kibana]# yum install -y kibana-7.5.1-x86_64. RPM [root@kibana_gra kibana]# systemctl enable kibana.service [root@kibana_gra kibana]# systemctl start kibana.service
#Modifying a Configuration File[root@kibana_gra kibana]# grep -Ev "^#|^$" /etc/kibana/kibana.yml server.port: 5601 server.host: Hosts: ["http://10.1.1.13:9200"] i18n.locale: "zh-cn"
#Configure hosts
10.1.1.13 es01 es02 es03
Copy the code
4.4.2 install Nginx
#Since Kibana has not provided authentication function since version 5.5, we use the official X-pack method of charging, here we use nginx agent to do authentication.
#Yum install nginx
[root@kibana_gra ~]# yum install -y nginx
#Configure kiban user name and password authentication for login
[root@kibana_gra ~]# yum install -y httpd-tools
[root@kibana_gra ~]# mkdir -p /etc/nginx/passwd
[root@kibana_gra ~]# htpasswd -c -b /etc/nginx/passwd/kibana.passwd kibana xxzx@789
#Go to the conf.d directory of nginx and configure the kibana.conf file/ root @ kibana_gra ~ # vim/etc/nginx/conf. D/kibana. Conf server {listen 10.58.96.183:5601; auth_basic "Kibana Auth"; auth_basic_user_file /etc/nginx/passwd/kibana.passwd; Location / {proxy_pass http://127.0.0.1:5601; proxy_redirect off; } } [root@kibana_gra conf.d]# systemctl start nginx [root@kibana_gra conf.d]# systemctl enable nginxCopy the code
4.4.2 access Kibana
4.5 Deploying the Kafka Cluster
4.5.1 Preparing the Configuration File
[root@kaf_zoo kafka]# cat docker-compose.yml
Copy the code
version: '2'
services:
zoo1:
image: wurstmeister/zookeeper
restart: always
hostname: zoo1
container_name: zoo1
ports:
- 2184: 2181
volumes:
- "/data/kafka/volume/zoo1/data:/data"
- "/data/kafka/volume/zoo1/datalog:/datalog"
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: Server. 1 = 0.0.0.0:2888-3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
kafka:
ipv4_address: 172.19. 011.
zoo2:
image: wurstmeister/zookeeper
restart: always
hostname: zoo2
container_name: zoo2
ports:
- 2185: 2181
volumes:
- "/data/kafka/volume/zoo2/data:/data"
- "/data/kafka/volume/zoo2/datalog:/datalog"
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 Server. 2 = 0.0.0.0:2888-3888 server.3=zoo3:2888:3888
networks:
kafka:
ipv4_address: 172.19. 012.
zoo3:
image: wurstmeister/zookeeper
restart: always
hostname: zoo3
container_name: zoo3
ports:
- 2186: 2181
volumes:
- "/data/kafka/volume/zoo3/data:/data"
- "/data/kafka/volume/zoo3/datalog:/datalog"
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 Server. 3 = 0.0.0.0:2888-3888
networks:
kafka:
ipv4_address: 172.19. 013.
kafka1:
image: wurstmeister/kafka
restart: always
hostname: kafka1
container_name: kafka1
ports:
- 9092: 9092
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka1
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
KAFKA_LISTENERS: PLAINTEXT://kafka1:9092
volumes:
- /data/kafka/logs/kafka1/logs:/kafka
external_links:
- zoo1
- zoo2
- zoo3
networks:
kafka:
ipv4_address: 172.19. 014.
kafka2:
image: wurstmeister/kafka
restart: always
hostname: kafka2
container_name: kafka2
ports:
- 9093: 9093
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka2
KAFKA_ADVERTISED_PORT: 9093
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka2:9093
KAFKA_LISTENERS: PLAINTEXT://kafka2:9093
volumes:
- /data/kafka/logs/kafka2/logs:/kafka
external_links:
- zoo1
- zoo2
- zoo3
networks:
kafka:
ipv4_address: 172.19. 015.
kafka3:
image: wurstmeister/kafka
restart: always
hostname: kafka3
container_name: kafka3
ports:
- 9094: 9094
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka3
KAFKA_ADVERTISED_PORT: 9094
KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka3:9094
KAFKA_LISTENERS: PLAINTEXT://kafka3:9094
volumes:
- /data/kafka/logs/kafka3/logs:/kafka
external_links:
- zoo1
- zoo2
- zoo3
networks:
kafka:
ipv4_address: 172.19. 016.
networks:
kafka:
external:
name: kafka
Copy the code
4.5.2 Starting the Kafka Cluster
#Create a network
[root@kaf_zoo kafka]# docker network create --subnet=172.19.0.0/24 kafka
#Start the cluster
[root@kaf_zoo kafka]# docker-compose up -d
Creating zoo2 ... done
Creating zoo3 ...
Creating kafka1 ...
Creating zoo1 ...
Creating kafka2 ...
Creating zoo2 ...
#Viewing Cluster Status[root@kaf_zoo kafka]# docker-compose ps Name Command State Ports ---------------------------------------------------------------------------------------------------------------------- Kafka1 start-kafka.sh Up 0.0.0.0:9092->9092/ TCP,:::9092->9092/ TCP kafka2 start-kafka 0.0.0.0:9093->9093/ TCP,:::9093->9093/ TCP kafka3 start-kafka.sh Up 0.0.0.0:9094->9094/ TCP,:::9094->9094/ TCP zoo1 /bin/sh -c /usr/sbin/sshd ... Up 0.0.0.00:2184 ->2181/ TCP,:::2184->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCP zoo2 /bin/sh -c /usr/sbin/sshd... Up 0.0.0.0:2185->2181/ TCP,:::2185->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCP zoo3 /bin/sh -c /usr/sbin/sshd... Up 0.0.0.0:2186->2181/ TCP,:::2186->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCPCopy the code
4.6 deployment Filebeat
4.6.1 installation Filebeat
[root@file_ng filebeat]# mkdir /data/filebeat && cd /data/filebeat [root@file_ng filebeat]# wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.8.5-linux-x86_64.tar.gz [root @ file_ng filebeat] # mv Filebeat - 6.8.5 - Linux - x86_64 / usr/local/filebeatCopy the code
4.6.2 configuration Filebeat
Back up the default configuration file
[root@file_ng filebeat]# mv filebeat.yml filebeat.yml.bak
Copy the code
New configuration file to read nginx logs
filebeat.inputs:
- type: log
access:
enabled: true
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
paths:
- /var/log/nginx/access.log
fields:
source: nginx-access
setup.ilm.enabled: false
output.kafka:
enabled: true
hosts: ["10.1.1.12:9092"."10.1.1.12:9093"."10.1.1.12:9094"]
topic: "elk-%{[fields.source]}"
partition.hash:
reachable_only: true
compression: gzip
max_message_bytes: 1000000
bulk_max_size: 2048
Copy the code
4.6.3 start Filebeat
[root@file_ng filebeat]# nohup ./filebeat -e -c filebeat.yml & [1] 6624 [root@file_ng filebeat]# nohup: Ignoring input and appending output to 'nohup.out'Copy the code
Then it works fine and can output to Kafka exactly
4.7 deployment LogStash
4.7.1 installation LogStash
[root@es_logst ~]# yum install java -y [root@es_logst ~]# mkdir /data/logstash && cd /data/logstash [root@es_logst ~]# Wget HTTP: / / https://artifacts.elastic.co/downloads/logstash/logstash-7.0.0.tar.gz/root @ es_logst logstash] # tar ZXF Logstash -7.0.0.tar.gz [root@es_logst logstash]# mv logstash-7.0.0 /usr/local/logstashCopy the code
4.7.2 configuration LogStash
[root@es_logst logstash]# cd /usr/local/logstash/config/
[root@es_logst config]# mv logstash-sample.conf logstash-sample.conf.bak
Copy the code
Create the logstash-sample.conf configuration file
input {
kafka {
bootstrap_servers = > "10.1.1.12:9092 ration. 1.12:9093 ration. 1.12:9094"
auto_offset_reset = > "latest"
topics_pattern = > "elk-.*"
codec = > "json"
consumer_threads = > 5
decorate_events = > "true"}}filter {
geoip {
target = > "geoip"
source = > "client_ip"
add_field = > [ "[geoip][coordinates]"."%{[geoip][longitude]}" ]
add_field = > [ "[geoip][coordinates]"."%{[geoip][latitude]}" ]
remove_field = > ["[geoip][latitude]"."[geoip][longitude]"."[geoip][country_code]"."[geoip][country_code2]"."[geoip][country_code3]"."[geoip][timezone]"."[geoip][continent_code]"."[geoip][region_code]"]}mutate {
convert = > [ "size"."integer" ]
convert = > [ "status"."integer" ]
convert = > [ "responsetime"."float" ]
convert = > [ "upstreamtime"."float" ]
convert = > [ "[geoip][coordinates]"."float" ]
remove_field = > [ "ecs"."agent"."host"."cloud"."@version"."input"."logs_type"]}useragent {
source = > "http_user_agent"
target = > "ua"
remove_field = > [ "[ua][minor]"."[ua][major]"."[ua][build]"."[ua][patch]"."[ua][os_minor]"."[ua][os_major]"]}}output {
elasticsearch {
# Logstash prints to ES
hosts = > ["10.1.1.13:9200"]
index = > "%{[fields][source]}-%{+YYYY-MM-dd}"
}
stdout {
codec = > rubydebug}}Copy the code
4.8 configuration Kibana
4.9 Simulating Faults
Simulate a LogStash failure
After waiting for some time, I checked the Kibana logs and found that the logs collected by Filebeat were not stored in ES and were displayed by Kibana
Start the LogStash again
After the LogStash failure, Kafka consumes the unconsumed information and writes it to ES. Kibana can display all the log information normally, avoiding the log loss caused by some component failures
5 Fault Records
5.1 Kibana Startup Fails
5.1.1 Symptom
Kibana server is not ready yet
Copy the code
5.1.2 Error Logs
[root@kibana_gra log]# journalctl -u kibana
Apr 27 14:58:24 kibana_gra kibana[12671]: {"type":"log","@timestamp":"2021-04-27T06:58:24Z","tags":["warning","migrations"],"pid":12671,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana."}
Copy the code
5.1.3 Solution
[root@es_logst logstash]# curl -XDELETE http://localhost:9200/.kibana_2
{"acknowledged":true}
Copy the code