1 background

With the increasing of customer, customer business complex degree increasing, the traditional server level monitoring due to monitoring particle size is too large, and need to be further explored the reason, when the alarm has been unable to meet demand, in order to further customer business, ensure healthy business running, we need to collect log server system, customers, business log, analysis and processing, Do fault occurs can locate the cause of the problem in the first time, notify the relevant personnel processing, so how to set the log file collection, how to set the log file, and immediately inform fault occurs to the appropriate personnel responsible for the business, became a problem many companies need to face, so log monitoring system arises at the historic moment.

2 Log monitoring system architecture design

2.1 Architecture Composition

2.2 Architecture Strategy

Configure the log collection client on the data source side to collect original logs and aggregate them to MQ. MQ selects Kafka for log message caching and distribution, deploys LogStash on the back end, subscribes to log messages in Kafka topic, and writes them to ES file storage.

ElastAlert determines whether log messages contain errors or exceptions and sends them to related contacts by email or SMS. ES provides data to Grafana/Kibana for data display. Users can query real-time logs through a Web interface

3 Log monitoring system introduction

3.1 Data Collection

The data collection layer is located on the service server cluster. A data collection tool is installed on each service server to collect logs, including application logs, system logs, and access logs. The collected logs are then sent to the message queue cluster.

The selection of data acquisition tool (Agent) should consider:

  • Deployment way
  • Deployment difficulty
  • Level of Service intrusion
  • Resource consumption

At present, the mainstream open source log collection tools are: Logstash, Filebeat, Fluentd and so on

Logstash

Logstash is an open source data collection engine with real-time plumbing capabilities. Logstash dynamically consolidates data from different data sources and standardizes the data to a destination of your choice

  • The advantage is that it is flexible, with many plug-ins, detailed documentation, and straightforward configuration formats that allow it to be used in a variety of scenarios. Besides, the whole TECHNOLOGY stack of ELK is widely used in many companies, so you can find a lot of related learning resources on the website

  • The disadvantage is its performance and resource consumption. Although its performance has improved greatly in recent years, it is still much slower than its replacements, and it can be a problem in the case of large data volumes. Another problem is that it does not currently support caching.

Filebeat

Filebeat is a lightweight log transfer tool that makes up for the shortcoming of Logstash. As a lightweight log transfer tool, Filebeat can push logs to Kafka, Logstash, ElasticSearch, and Redis

  • The advantage is that it is only a binary file without any dependencies. It takes very few resources, and although it is still very young, there is very little that can go wrong with it because of its simplicity, so its reliability is very high.
  • The disadvantage is thatFilebeatThe range of applications is very limited, so we have problems in certain scenarios. For example, if usingLogstashAs a downstream pipeline, we also encounter performance issues. And because of thatFilebeatThe scope of… is expanding. At first, it can only send logs toLogstashandElasticsearchAnd now it can send logs toKafkaandRedis.

Fluentd

Fluentd was created primarily to use JSON as the log output whenever possible, so the transport tool and its downstream transmission line do not have to guess the types of fields in the substring. This provides libraries for almost any language, which means you can plug it into custom programs.

  • Advantage isFluentdPlug-in is to useRubyLanguage development is very easy to write and maintain. So there are a lot of them, and almost all source and target stores have plug-ins. It also means it worksFluentdTo connect everything together.
  • The disadvantage is thatFluentdFlexibility is not good. However, unstructured data can still be parsed using regular expressions. Although performance is good in most scenarios, it is not the best, and its buffering only exists with the output side, the single-threaded core as wellRuby GILThe plug-in implementation means that performance is limited under its large nodes.

To sum up, Filebeat takes up few resources, is lightweight, has high reliability and is easy to deploy. Therefore, Filebeat is used as a tool at the data acquisition layer

3.2 Message Queue

As the business scale expands, the daily quality keeps increasing, and the product line connected to the log service keeps increasing, the performance of writing to ES will be reduced when the traffic peak occurs, and the CPU will be full, causing the risk of cluster downtime at any time. Therefore, it is necessary to introduce message queue for peak cutting and valley filling. After raw logs are sent to the Kafka+Zookeeper cluster, they are stored in a centralized manner. In this case, FileBeat is the message producer and the stored messages can be consumed at any time.

3.3 Data Analysis

As a consumer, LogStash pulls original logs from the Kafka+ ZooKeeper cluster node in real time, analyzes, cleans, and filters the obtained original logs according to rules, and forwards the cleaned logs to the Elasticsearch cluster.

3.4 Persistent Data Storage

After receiving the Logstash data, the Elasticsearch cluster writes disks, creates an index library, and stores structured data to the Elasticsearch cluster

3.5 Data Query display

Kibana/Grafana is a visual data presentation platform that reads data from the Elasticsearch cluster when requested, and then performs visualization and multidimensional analysis

4 Installation and Deployment

4.1 Environment Preparations

The server Run the software Deployment way
10.1.1.11 Nginx + Filebeat binary
10.1.1.12 Kafka + Zookeeper docker-compose
10.1.1.13 ES + Logstash Docker-compose + binary
10.1.1.14 Kibana + Grafana binary

4.2 Docker Environment Deployment

Deployed on 10.1.1.12 and 10.1.1.13

2 install the docker

[root@kaf_zoo ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@kaf_zoo ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@kaf_zoo ~]# yum -y install docker-ce
docker -v
[root@kaf_zoo ~]# docker -v
Docker version 20.10.6, build 370c289
Copy the code

4.2.2 Configuring the accelerator

[root@kaf_zoo ~]# sudo mkdir -p /etc/docker
[root@kaf_zoo ~]# sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://su9ppkb0.mirror.aliyuncs.com"]
}
EOF
[root@kaf_zoo ~]# systemctl daemon-reload
[root@kaf_zoo ~]# systemctl start docker
[root@kaf_zoo ~]# systemctl enable docker
Copy the code

Install the docker – compose holdings

[root@kaf_zoo ~]# yum install docker-compose -y
[root@kaf_zoo ~]# docker-compose -v
docker-compose version 1.18.0, build 8dd22a9
Copy the code

4.3 Deploying an ES Cluster

4.3.1 Environment Configuration

#Optimized kernel for ES support
[root@es_logst es]# echo 'vm.max_map_count=262144' >> /etc/sysctl.conf
[root@es_logst es]# sysctl -p

#Configuration variables[root@es_logst es]# echo 'ELK_VERSION=7.5.1' >.env
#Enabling IPv4 Forwarding
[root@es_logst es]# echo "net.ipv4.ip_forward = 1" /usr/lib/sysctl.d/00-system.conf
[root@es_logst es]# systemctl restart network
[root@es_logst es]# systemctl restart docker
Copy the code

4.3.2 Preparing a Directory

Mkdir /data/es/data-es{1,2,3} mkdir /data/es/config mkdir /data/es/elasticsearchCopy the code

4.3.3 Preparing the Configuration File

cat /data/es/docker-compose.yml

version: '3.3'
services:
  es01:
    build:
      context: elasticsearch/
      args:
        ELK_VERSION: $ELK_VERSION
    container_name: es01
    volumes:
      - type: bind
        source: /data/es/config/elasticsearch.yml
        target: /usr/share/elasticsearch/config/elasticsearch.yml
        read_only: true
      - type: volume
        source: data-es1
        target: /usr/share/elasticsearch/data
    ports:
      - 9200: 9200
      - 9300: 9300
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: - 1
        hard: - 1
    networks:
      - elastic
  es02:
    build:
      context: elasticsearch/
      args:
        ELK_VERSION: $ELK_VERSION
    container_name: es02
    volumes:
      - type: bind
        source: /data/es/config/elasticsearch.yml
        target: /usr/share/elasticsearch/config/elasticsearch.yml
        read_only: true
      - type: volume
        source: data-es2
        target: /usr/share/elasticsearch/data
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: - 1
        hard: - 1
    networks:
      - elastic
  es03:
    build:
      context: elasticsearch/
      args:
        ELK_VERSION: $ELK_VERSION
    container_name: es03
    volumes:
      - type: bind
        source: /data/es/config/elasticsearch.yml
        target: /usr/share/elasticsearch/config/elasticsearch.yml
        read_only: true
      - type: volume
        source: data-es3
        target: /usr/share/elasticsearch/data
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: - 1
        hard: - 1
    networks:
      - elastic

volumes:
  data-es1:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/es/data-es1
      
  data-es2:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/es/data-es2
      
  data-es3:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/es/data-es3

networks:
  elastic:
    driver: bridge
Copy the code

/data/es/elasticsearch/Dockerfile

ARG ELK_VERSION = 7.5.1
# https://github.com/elastic/elasticsearch-docker
# FROM docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION}
FROM elasticsearch:${ELK_VERSION}
# Add your elasticsearch plugins setup here
# Example: RUN elasticsearch-plugin install analysis-icu
Copy the code

/data/es/config/elasticsearch.yml

cluster.name: "es-docker-cluster"
#This configuration item is used to set the IP address of ElasticSearch. The default value is 0.0.0.0Network. The host: 0.0.0.0Copy the code

Directory overview

[root@es_logst data]# pwd
/data
[root@es_logst data]# tree
.
`-- es
    |-- config
    |   `-- elasticsearch.yml
    |-- data-es1
    |-- data-es2
    |-- data-es3
    |-- docker-compose.yml
    `-- elasticsearch
        `-- Dockerfile

6 directories, 3 files
Copy the code

4.3.4 Starting the ES Cluster

[root@es_logst es]# docker-compose up -d Starting es02 ... Starting es03 ... Starting es01 ... done [root@es_logst es]# docker-compose ps Name Command State Ports -------------------------------------------------------------------------------------------------- es01 /usr/local/bin/docker-entr ... Up 0.0.0.0:9200->9200/ TCP,:::9200->9200/ TCP, 9300/ TCP es02 /usr/local/bin/docker-entr... Up 9200/tcp, 9300/tcp es03 /usr/local/bin/docker-entr ... Up 9200/tcp, 9300/tcpCopy the code
[root@es_logst es]# curl 10.1.1.13:9200 {"name" : "es01", "cluster_name" : "es-docker-cluster", "cluster_uuid" : "P5FnRclnSBCkO_wPAMJPow", "version" : {"number" : "7.5.1", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96", "build_date" : "2019-12-16t22:57:37.835892z ", "build_snapshot" : false, "lucene_version" : "Minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1"}, "tagline" : "You Know, for Search"}Copy the code

The ES cluster is deployed.

4.4 deployment Kibana

4.4.1 installation Kibana

[root@kibana_gra ~]# mkdir /data/kibana && cd /data/kibana
[root@kibana_gra kibana]# wget https://artifacts.elastic.co/downloads/kibana/kibana-7.5.1-x86_64.rpm
Copy the code
#The installation[root@kibana_gra kibana]# yum install -y kibana-7.5.1-x86_64. RPM [root@kibana_gra kibana]# systemctl enable kibana.service [root@kibana_gra kibana]# systemctl start kibana.service
#Modifying a Configuration File[root@kibana_gra kibana]# grep -Ev "^#|^$" /etc/kibana/kibana.yml server.port: 5601 server.host: Hosts: ["http://10.1.1.13:9200"] i18n.locale: "zh-cn"
#Configure hosts
10.1.1.13 es01 es02 es03
Copy the code

4.4.2 install Nginx

#Since Kibana has not provided authentication function since version 5.5, we use the official X-pack method of charging, here we use nginx agent to do authentication.
#Yum install nginx
[root@kibana_gra ~]# yum install -y nginx

#Configure kiban user name and password authentication for login
[root@kibana_gra ~]# yum install -y httpd-tools
[root@kibana_gra ~]# mkdir -p /etc/nginx/passwd
[root@kibana_gra ~]# htpasswd -c -b /etc/nginx/passwd/kibana.passwd kibana xxzx@789

#Go to the conf.d directory of nginx and configure the kibana.conf file/ root @ kibana_gra ~ # vim/etc/nginx/conf. D/kibana. Conf server {listen 10.58.96.183:5601; auth_basic "Kibana Auth"; auth_basic_user_file /etc/nginx/passwd/kibana.passwd; Location / {proxy_pass http://127.0.0.1:5601; proxy_redirect off; } } [root@kibana_gra conf.d]# systemctl start nginx [root@kibana_gra conf.d]# systemctl enable nginxCopy the code

4.4.2 access Kibana

4.5 Deploying the Kafka Cluster

4.5.1 Preparing the Configuration File

[root@kaf_zoo kafka]# cat docker-compose.yml
Copy the code
version: '2'

services:
  zoo1:
    image: wurstmeister/zookeeper
    restart: always
    hostname: zoo1
    container_name: zoo1
    ports:
      - 2184: 2181
    volumes:
      - "/data/kafka/volume/zoo1/data:/data"
      - "/data/kafka/volume/zoo1/datalog:/datalog"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: Server. 1 = 0.0.0.0:2888-3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    networks:
      kafka:
        ipv4_address: 172.19. 011.

  zoo2:
    image: wurstmeister/zookeeper
    restart: always
    hostname: zoo2
    container_name: zoo2
    ports:
      - 2185: 2181
    volumes:
      - "/data/kafka/volume/zoo2/data:/data"
      - "/data/kafka/volume/zoo2/datalog:/datalog"
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888 Server. 2 = 0.0.0.0:2888-3888 server.3=zoo3:2888:3888
    networks:
      kafka:
        ipv4_address: 172.19. 012.

  zoo3:
    image: wurstmeister/zookeeper
    restart: always
    hostname: zoo3
    container_name: zoo3
    ports:
      - 2186: 2181
    volumes:
      - "/data/kafka/volume/zoo3/data:/data"
      - "/data/kafka/volume/zoo3/datalog:/datalog"
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 Server. 3 = 0.0.0.0:2888-3888
    networks:
      kafka:
        ipv4_address: 172.19. 013.

  kafka1:
    image: wurstmeister/kafka
    restart: always
    hostname: kafka1
    container_name: kafka1
    ports:
      - 9092: 9092
    environment:
      KAFKA_ADVERTISED_HOST_NAME: kafka1
      KAFKA_ADVERTISED_PORT: 9092
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
      KAFKA_LISTENERS: PLAINTEXT://kafka1:9092
    volumes:
      - /data/kafka/logs/kafka1/logs:/kafka
    external_links:
      - zoo1
      - zoo2
      - zoo3
    networks:
      kafka:
        ipv4_address: 172.19. 014.

  kafka2:
    image: wurstmeister/kafka
    restart: always
    hostname: kafka2
    container_name: kafka2
    ports:
      - 9093: 9093
    environment:
      KAFKA_ADVERTISED_HOST_NAME: kafka2
      KAFKA_ADVERTISED_PORT: 9093
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka2:9093
      KAFKA_LISTENERS: PLAINTEXT://kafka2:9093
    volumes:
      - /data/kafka/logs/kafka2/logs:/kafka
    external_links:
      - zoo1
      - zoo2
      - zoo3
    networks:
      kafka:
        ipv4_address: 172.19. 015.

  kafka3:
    image: wurstmeister/kafka
    restart: always
    hostname: kafka3
    container_name: kafka3
    ports:
      - 9094: 9094
    environment:
      KAFKA_ADVERTISED_HOST_NAME: kafka3
      KAFKA_ADVERTISED_PORT: 9094
      KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka3:9094
      KAFKA_LISTENERS: PLAINTEXT://kafka3:9094
    volumes:
      - /data/kafka/logs/kafka3/logs:/kafka
    external_links:
      - zoo1
      - zoo2
      - zoo3
    networks:
      kafka:
        ipv4_address: 172.19. 016.

networks:
  kafka:
    external:
      name: kafka
Copy the code

4.5.2 Starting the Kafka Cluster

#Create a network
[root@kaf_zoo kafka]# docker network create --subnet=172.19.0.0/24 kafka

#Start the cluster
[root@kaf_zoo kafka]# docker-compose up -d
Creating zoo2 ... done
Creating zoo3 ... 
Creating kafka1 ... 
Creating zoo1 ... 
Creating kafka2 ... 
Creating zoo2 ... 

#Viewing Cluster Status[root@kaf_zoo kafka]# docker-compose ps Name Command State Ports ---------------------------------------------------------------------------------------------------------------------- Kafka1 start-kafka.sh Up 0.0.0.0:9092->9092/ TCP,:::9092->9092/ TCP kafka2 start-kafka 0.0.0.0:9093->9093/ TCP,:::9093->9093/ TCP kafka3 start-kafka.sh Up 0.0.0.0:9094->9094/ TCP,:::9094->9094/ TCP zoo1 /bin/sh -c /usr/sbin/sshd ... Up 0.0.0.00:2184 ->2181/ TCP,:::2184->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCP zoo2 /bin/sh -c /usr/sbin/sshd... Up 0.0.0.0:2185->2181/ TCP,:::2185->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCP zoo3 /bin/sh -c /usr/sbin/sshd... Up 0.0.0.0:2186->2181/ TCP,:::2186->2181/ TCP, 22/ TCP, 2888/ TCP, 3888/ TCPCopy the code

4.6 deployment Filebeat

4.6.1 installation Filebeat

[root@file_ng filebeat]# mkdir /data/filebeat && cd /data/filebeat [root@file_ng filebeat]# wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.8.5-linux-x86_64.tar.gz [root @ file_ng filebeat] # mv Filebeat - 6.8.5 - Linux - x86_64 / usr/local/filebeatCopy the code

4.6.2 configuration Filebeat

Back up the default configuration file

[root@file_ng filebeat]# mv filebeat.yml filebeat.yml.bak
Copy the code

New configuration file to read nginx logs

filebeat.inputs:
- type: log
  access:
  enabled: true
  json.keys_under_root: true
  json.overwrite_keys: true
  json.add_error_key: true
  paths:
    - /var/log/nginx/access.log
  fields:
    source: nginx-access

setup.ilm.enabled: false

output.kafka:
  enabled: true
  hosts: ["10.1.1.12:9092"."10.1.1.12:9093"."10.1.1.12:9094"]
  topic: "elk-%{[fields.source]}"
  partition.hash:
    reachable_only: true
  compression: gzip
  max_message_bytes: 1000000
  bulk_max_size: 2048
Copy the code

4.6.3 start Filebeat

[root@file_ng filebeat]# nohup ./filebeat -e -c filebeat.yml & [1] 6624 [root@file_ng filebeat]# nohup: Ignoring input and appending output to 'nohup.out'Copy the code

Then it works fine and can output to Kafka exactly

4.7 deployment LogStash

4.7.1 installation LogStash

[root@es_logst ~]# yum install java -y [root@es_logst ~]# mkdir /data/logstash && cd /data/logstash [root@es_logst ~]# Wget HTTP: / / https://artifacts.elastic.co/downloads/logstash/logstash-7.0.0.tar.gz/root @ es_logst logstash] # tar ZXF Logstash -7.0.0.tar.gz [root@es_logst logstash]# mv logstash-7.0.0 /usr/local/logstashCopy the code

4.7.2 configuration LogStash

[root@es_logst logstash]# cd /usr/local/logstash/config/
[root@es_logst config]# mv logstash-sample.conf logstash-sample.conf.bak
Copy the code

Create the logstash-sample.conf configuration file

input {
  kafka {
    bootstrap_servers = > "10.1.1.12:9092 ration. 1.12:9093 ration. 1.12:9094"
    auto_offset_reset = > "latest"           
    topics_pattern = > "elk-.*"
    codec = > "json"
    consumer_threads = > 5
    decorate_events = > "true"}}filter {
  geoip {
    target = > "geoip"
    source = > "client_ip"
    add_field = > [ "[geoip][coordinates]"."%{[geoip][longitude]}" ]
    add_field = > [ "[geoip][coordinates]"."%{[geoip][latitude]}" ]
    remove_field = > ["[geoip][latitude]"."[geoip][longitude]"."[geoip][country_code]"."[geoip][country_code2]"."[geoip][country_code3]"."[geoip][timezone]"."[geoip][continent_code]"."[geoip][region_code]"]}mutate {
    convert = > [ "size"."integer" ]
    convert = > [ "status"."integer" ]
    convert = > [ "responsetime"."float" ]
    convert = > [ "upstreamtime"."float" ]
    convert = > [ "[geoip][coordinates]"."float" ]
    remove_field = > [ "ecs"."agent"."host"."cloud"."@version"."input"."logs_type"]}useragent {
    source = > "http_user_agent"
    target = > "ua"
    remove_field = > [ "[ua][minor]"."[ua][major]"."[ua][build]"."[ua][patch]"."[ua][os_minor]"."[ua][os_major]"]}}output {
  elasticsearch {
    # Logstash prints to ES
    hosts = > ["10.1.1.13:9200"]
    index = > "%{[fields][source]}-%{+YYYY-MM-dd}"
  }
  stdout {
    codec = > rubydebug}}Copy the code

4.8 configuration Kibana

4.9 Simulating Faults

Simulate a LogStash failure

After waiting for some time, I checked the Kibana logs and found that the logs collected by Filebeat were not stored in ES and were displayed by Kibana

Start the LogStash again

After the LogStash failure, Kafka consumes the unconsumed information and writes it to ES. Kibana can display all the log information normally, avoiding the log loss caused by some component failures

5 Fault Records

5.1 Kibana Startup Fails

5.1.1 Symptom

Kibana server is not ready yet
Copy the code

5.1.2 Error Logs

[root@kibana_gra log]# journalctl -u kibana
Apr 27 14:58:24 kibana_gra kibana[12671]: {"type":"log","@timestamp":"2021-04-27T06:58:24Z","tags":["warning","migrations"],"pid":12671,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana."}
Copy the code

5.1.3 Solution

[root@es_logst logstash]# curl -XDELETE http://localhost:9200/.kibana_2
{"acknowledged":true}
Copy the code