Graylog is an open source professional log aggregation, analysis, audit, display, warning tool, similar to ELK, but simpler, how to deploy graylog, use, and a brief overview of graylog workflow
The length of this paper is quite long. A total of three machines are used, on which kafka cluster (2.3), ES cluster (7.11.2), MongoDB replica set (4.2), and Graylog cluster (4.0.2) are deployed. The logs collected are K8S logs. Log collection into Kafka with FileBeat (7.11.2) using DaemonSet. This article will start with deployment and walk you step-by-step through how graylog is deployed and how to use it simply.
Graylog introduction
Component is introduced
As you can see from the architecture diagram, graylog consists of three parts:
- Mongodb stores gralog console configuration information, graylog cluster status information, and some meta-information
- Es stores log data and retrieves data
- Graylog acts as a transit character
Mongodb and ES don’t have much to say about their functions, but graylog’s components and their functions are highlighted.
- Inputs Log data source. The Inputs can be actively captured through the Sidecar of graylog or sent through beats or syslog
- Extractors log data format conversion, mainly used for JSON parsing, KV parsing, timestamp parsing, regular parsing
- Streams log information that is sent to a specified index by setting rules
- Indices Persistent data store, set index name, index expiration policy, number of fragments, number of copies, flush interval, etc
- Outputs the forward of log data, sending the parsed Stream to other Graylog clusters
- Set up filtering rules, add or delete fields, conditional filtering, and customize functions in charge of data cleaning
- Sidecar Lightweight log collector
- LookupTables service parsing, IP-based Whois queries, and source IP-based intelligence monitoring
- Geolocation Visualizes geographical location and monitors based on source IP addresses
The process is introduced
Graylog collects logs by setting Input, such as kafka or Redis or directly collecting logs by fileBeat, and then Input configures Extractors for extracting and converting log fields. Multiple Extractors can be set and executed in sequence. After setting, the system will save logs to Stream by matching rules set in Stream. Index position can be specified in Stream and then stored in index of ES. The corresponding log can be viewed in the console by specifying the Stream name.
Install the mongo
According to the official document, the installation is 4.2.x
Time synchronization
Install the ntpdate
yum install ntpdate -y
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
Copy the code
Add to a scheduled task
# crontab -e
5 * * * * ntpdate -u ntp.ntsc.ac.cn
Copy the code
Configure the warehouse source and install it
Repo [mongodb-org-4.2] name= mongodb Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/ mongo - org / 4.2 / x86_64 / gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc gpgcheck = 1 enabled = 1Copy the code
Then install
yum makecache
yum -y install mongodb-org
Copy the code
Then start
systemctl daemon-reload
systemctl enable mongod.service
systemctl start mongod.service
systemctl --type=service --state=active | grep mongod
Copy the code
Modify the replica set of configuration file Settings
# vim /etc/mongod.conf
# mongod.conf
# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/
# where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
# Where and how to store data.
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
# engine:
# wiredTiger:
# how the process runs
processManagement:
fork: true # fork and run in background
pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile
timeZoneInfo: /usr/share/zoneinfo
# network interfaces
net:
port: 27017
bindIp: 0.0. 0. 0 # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting.
#security:
#operationProfiling:
replication:
replSetName: graylog-rs Set the replica set name
#sharding:
## Enterprise-Only Options
#auditLog:
#snmp:
Copy the code
Initialize the replica set
> use admin;
switched to db admin
> rs.initiate( {
... _id : "graylog-rs". members: [ ... { _id: 0, host:"10.0.105.74:27017"},... { _id: 1, host:"10.0.105.76:27017"},... { _id: 2, host:"10.0.105.96:27017"}... ] . {})"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1615885669, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1615885669, 1)
}
Copy the code
Verify the replica set status
If not, the cluster will have two roles, one is Primary and the other is Secondary, which can be viewed using the command
rs.status()
Copy the code
Returns a bunch of messages like this:
"members": [{"_id" : 0."name" : "10.0.105.74:27017"."health" : 1."state" : 1."stateStr" : "PRIMARY"."uptime" : 623."optime" : {
"ts" : Timestamp(16158858291),"t" : NumberLong(1)},"optimeDate" : ISODate("2021-03-16T09:10:29Z"),
"syncingTo" : ""."syncSourceHost" : ""."syncSourceId" : - 1."infoMessage" : ""."electionTime" : Timestamp(16158856791),"electionDate" : ISODate("2021-03-16T09:07:59Z"),
"configVersion" : 1."self" : true."lastHeartbeatMessage" : ""
},
{
"_id" : 1."name" : "10.0.105.76:27017"."health" : 1."state" : 2."stateStr" : "SECONDARY"."uptime" : 162."optime" : {
"ts" : Timestamp(16158858291),"t" : NumberLong(1)},"optimeDurable" : {
"ts" : Timestamp(16158858291),"t" : NumberLong(1)},"optimeDate" : ISODate("2021-03-16T09:10:29Z"),
"optimeDurableDate" : ISODate("2021-03-16T09:10:29Z"),
"lastHeartbeat" : ISODate("The 2021-03-16 T09:" 690 z"),
"lastHeartbeatRecv" : ISODate("The 2021-03-16 T09:10:30. 288 z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : ""."syncingTo" : "10.0.105.74:27017"."syncSourceHost" : "10.0.105.74:27017"."syncSourceId" : 0."infoMessage" : ""."configVersion" : 1
},
{
"_id" : 2."name" : "10.0.105.96:27017"."health" : 1."state" : 2."stateStr" : "SECONDARY"."uptime" : 162."optime" : {
"ts" : Timestamp(16158858291),"t" : NumberLong(1)},"optimeDurable" : {
"ts" : Timestamp(16158858291),"t" : NumberLong(1)},"optimeDate" : ISODate("2021-03-16T09:10:29Z"),
"optimeDurableDate" : ISODate("2021-03-16T09:10:29Z"),
"lastHeartbeat" : ISODate("The 2021-03-16 T09:" 690 z"),
"lastHeartbeatRecv" : ISODate("The 2021-03-16 T09:10:30. 286 z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : ""."syncingTo" : "10.0.105.74:27017"."syncSourceHost" : "10.0.105.74:27017"."syncSourceId" : 0."infoMessage" : ""."configVersion" : 1}]Copy the code
Create a user
Just run it on any machine
use admin
db.createUser({user: "admin".pwd: "Root_1234", roles: ["root"]})
db.auth("admin"."Root_1234")
Copy the code
Then, without exiting, create another user for graylog connection
use graylog
db.createUser("graylog", {
"roles" : [{
"role" : "dbOwner"."db" : "graylog"
}, {
"role" : "readWrite"."db" : "graylog"}]})Copy the code
Generate the keyFile file
openssl rand -base64 756 > /var/lib/mongo/access.key
Copy the code
Modify the permissions
chown -R mongod.mongod /var/lib/mongo/access.key
chmod 600 /var/lib/mongo/access.key
Copy the code
Once this key is generated, you need to copy it to the other two machines and change the permissions as well
SCP - r/var/lib/mongo/access. Key 10.0.105.76: / var/lib/mongo /Copy the code
After the configuration file is copied, modify the configuration file
# vim /etc/mongod.conf
Add the following configuration
security:
keyFile: /var/lib/mongo/access.key
authorization: enabled
Copy the code
All three machines need to be set up this way, and then restart the service
systemctl restart mongod
Copy the code
Then login verification can verify two places
- Check whether the authentication succeeds
- Check whether the replica set status is normal
If the above is ok, then the mongodb4.2 replica set installed via YUM is ready to deploy. Now go deploy the ES cluster
Deploying an Es Cluster
Es version 7.11.x is the latest version
System optimization
- Kernel parameter optimization
# vim /etc/sysctl.conf
fs.file-max=655360
vm.max_map_count=655360
vm.swappiness = 0
Copy the code
- Modify the limits
# vim /etc/security/limits.conf
* soft nproc 655350
* hard nproc 655350
* soft nofile 655350
* hard nofile 655350
* hard memlock unlimited
* soft memlock unlimited
Copy the code
- Add a common user You need to start es as a common user
useradd es
groupadd es
echo 123456 | passwd es --stdin
Copy the code
- Install the JDK
Yum install - y Java - 1.8.0 comes with - its - devel. X86_64Copy the code
Setting environment Variables
# vim /etc/profile
exportJAVA_HOME = / usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.282. B08-1. El7_9. X86_64 /export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
Copy the code
Uploading a Compressed package
Es the download address: artifacts. Elastic. Co/downloads/e…
Unpack the
Tar ZXVF elasticSearch-7.11.2-linux-x86_64.tar. gz -c /usr/local/
Copy the code
Modify the permissions
chown -R es.es /usr/local/ elasticsearch - 7.11.2Copy the code
Modifying ES Configurations
Configure the cluster
# vim/usr/local/elasticsearch - 7.11.2 / config/elasticsearch. Yml
cluster.name: graylog-cluster
node.name: node03
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
bootstrap.memory_lock: true
network.host: 0.0. 0. 0
http.port: 9200
discovery.seed_hosts: ["10.0.105.74"."10.0.105.76"."10.0.105.96"]
cluster.initial_master_nodes: ["10.0.105.74"."10.0.105.76"]
http.cors.enabled: true
http.cors.allow-origin: "*"
Copy the code
Modify the JVM memory size
-Xms16g Set to half of the host's memory
-Xmx16g
Copy the code
Use Systemd to manage services
# vim /usr/lib/systemd/system/elasticsearch.service
[Unit]
Description=elasticsearch server daemon
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=es
Group=es
LimitMEMLOCK=infinity
LimitNOFILE=655350
LimitNPROC=655350
ExecStart=/usr/local/ elasticsearch - 7.11.2 / bin/elasticsearch Restart = always [Install] WantedBy = multi - user. TargetCopy the code
Start and set boot
systemctl daemon-reload
systemctl enable elasticsearch
systemctl start elasticsearch
Copy the code
So let’s just verify that
# curl - XGET http://127.0.0.1:9200/_cluster/health? pretty
{
"cluster_name" : "graylog-cluster"."status" : "green"."timed_out" : false."number_of_nodes": 3."number_of_data_nodes": 3."active_primary_shards" : 1,
"active_shards": 2."relocating_shards": 0."initializing_shards": 0."unassigned_shards": 0."delayed_unassigned_shards": 0."number_of_pending_tasks": 0."number_of_in_flight_fetch": 0."task_max_waiting_in_queue_millis": 0."active_shards_percent_as_number": 100.0}Copy the code
At this point, es is installed
Deploy the Kafka cluster
Since my machine is reusable and I’ve already installed the Java environment, I won’t write it here
Downloading the Installation package
Zookeeper: kafka: https://www.dogfei.cn/pkgs/kafka_2.12-2.3.0.tgz https://www.dogfei.cn/pkgs/apache-zookeeper-3.6.0-bin.tar.gzCopy the code
Unpack the
Tar ZXVF kafka_2.12-2.3.0.tgz -c /usr/local/
tar zxvf apache-zookeeper-3.6.0-bin.tar.gz -C /usr/local/
Copy the code
Modifying a Configuration File
kafka
# # grep -v - E "^ | ^ $"/usr/local/kafka_2. 12-2.3.0 / config/server propertiesBroker. Id = 1 listeners = PLAINTEXT: / / 10.0.105.74:9092 num.net work. Threads = 3 num. IO. Threads = 8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 log.dirs=/data/kafka/data num.partitions=8 num.recovery.threads.per.data.dir=1 offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 message.max.bytes=20971520 log.retention.hours=1 log.retention.bytes=1073741824 log.segment.bytes=536870912 log.retention.check.interval.ms=300000 Zookeeper. Connect = 10.0.105.74:2181,10.0. 105.76:2181,10.0. 105.96:2181 zookeeper. Connection timeout. Ms = 1000000 zookeeper.sync.time.ms=2000 group.initial.rebalance.delay.ms=0 log.cleaner.enable=true
delete.topic.enable=true
Copy the code
zookeeper
"^ # # grep -v - E | ^ $"/usr/local/apache - they are - 3.6.0 - bin/conf/zoo. The CFGtickTime=10000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper/data clientPort=2181 admin.serverPort=8888 Server. 1 = 10.0.105.74:22888-33888 for server 2 = 10.0.105.76:22888:33888 server. 3 = 10.0.105.96:22888-33888Copy the code
Well, don’t forget to create the corresponding directory
Join systemd
kafka
# cat /usr/lib/systemd/system/kafka.service
[Unit]
Description=Kafka
After=zookeeper.service
[Service]
Type=simple
Environment=LOG_DIR=/data/kafka/logs
WorkingDirectory=/usr/local/ kafka_2. 12-2.3.0 ExecStart = / usr /local/ kafka_2. 12-2.3.0 / bin/kafka - server - start. Sh/usr /local/ kafka_2. 12-2.3.0 / config/server properties ExecStop = / usr /local12-2.3.0 / bin / / kafka_2 kafka - server - stop. Sh Restart = always [Install] WantedBy = multi - user. TargetCopy the code
zookeeper
# cat /usr/lib/systemd/system/zookeeper.service
[Unit]
Description=zookeeper.service
After=network.target
[Service]
Type=forking
Environment=ZOO_LOG_DIR=/data/zookeeper/logs
ExecStart=/usr/local/ apache - they are - 3.6.0 - bin/bin/zkServer. Sh start ExecStop = / usr /local/apache-zookeeper-3.6.0-bin/bin/ zkserver. sh stop Restart=always [Install] WantedBy=multi-userCopy the code
Start the service
systemctl daemon-reload
systemctl start zookeeper
systemctl start kafka
systemctl enable zookeeper
systemctl enable kafka
Copy the code
Deploy filebeat
Because k8S logs are collected, FileBeat is deployed in DaemonSet mode, as shown in the following example: DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: filebeat
name: filebeat
namespace: default
spec:
selector:
matchLabels:
app: filebeat
template:
metadata:
labels:
app: filebeat
name: filebeat
spec:
affinity: {}
containers:
- args:
- -e
- -E
- http.enabled=true
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: Docker. Elastic. Co/beats/filebeat: 7.11.2
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- sh
- -c
- | #! /usr/bin/env bash -e curl --fail 127.0.0.1:5066 failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: filebeat
resources:
limits:
cpu: "1"
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
privileged: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/filebeat/filebeat.yml
name: filebeat-config
readOnly: true
subPath: filebeat.yml
- mountPath: /usr/share/filebeat/data
name: data
- mountPath: /opt/docker/containers/
name: varlibdockercontainers
readOnly: true
- mountPath: /var/log
name: varlog
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: filebeat
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
tolerations:
- operator: Exists
volumes:
- configMap:
defaultMode: 384
name: filebeat-daemonset-config
name: filebeat-config
- hostPath:
path: /opt/docker/containers
type: ""
name: varlibdockercontainers
- hostPath:
path: /var/log
type: ""
name: varlog
- hostPath:
path: /var/lib/filebeat-data
type: DirectoryOrCreate
name: data
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
Copy the code
Configmap reference
apiVersion: v1
data:
filebeat.yml: | filebeat.inputs: - type: container paths: - /var/log/containers/*.log
# multiline merge
multiline.pattern: '^ [0-9] {4} - [0-9] {2} - [0-9] {2}'
multiline.negate: true
multiline.match: after
multiline.timeout: 30
fields:
# Custom field for logStash identifies k8S input log
service: k8s-log
# disallow collection of host. XXXX
#publisher_pipeline.disable_host: true
processors:
- add_kubernetes_metadata:
# add k8S description field
default_indexers.enabled: true
default_matchers.enabled: true
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_fields:
# Delete unnecessary fields
fields: ["host"."tags"."ecs"."log"."prospector"."agent"."input"."beat"."offset"]
ignore_missing: true
output.kafka:
hosts: ["10.0.105.74:9092"."10.0.105.76:9092"."10.0.105.96:9092"]
topic: "dev-k8s-log"
compression: gzip
max_message_bytes: 1000000
kind: ConfigMap
metadata:
labels:
app: filebeat
name: filebeat-daemonset-config
namespace: default
Copy the code
Then execute, and start up the POD
Deploy the Graylog cluster
Import the RPM package
The RPM - Uvh https://packages.graylog2.org/repo/packages/graylog-4.0-repository_latest.rpmCopy the code
The installation
yum install graylog-server -y
Copy the code
Start and join boot boot
systemctl enable graylog-server
systemctl start graylog-server
Copy the code
To generate the secret key
Generate two secret keys for root_password_sha2 and password_secret in the configuration file, respectively
# echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1
# pwgen -n -1 -s 40 1 # if you do not have this command, please find ubuntu, apt install pwgen and download it
Copy the code
Modifying a Configuration File
# vim /etc/graylog/server/server.conf
is_master = false If yes, set this parameter to true. There is only one primary node in the cluster
node_id_file = /etc/graylog/server/node-id
password_secret = iMh21uM57Pt2nMHDicInjPvnE8o894AIs7rJj9SW Configure the secret key generated above here
root_password_sha2 = 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92 Configure the secret key generated above herePlugin_dir = /usr/share/graylog-server/plugin http_bind_address = 0.0.0.0:9000 http_publish_uri = plugin_dir = /usr/share/graylog-server/plugin http_bind_address = 0.0.0.0:9000 http_publish_uri = Web_enable = http://10.0.105.96:9000/true
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 2
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = falseelasticsearch_analyzer = standard output_batch_size = 5000 output_flush_interval = 120 output_fault_count_threshold = 8 output_fault_penalty_seconds = 120 processbuffer_processors = 20 outputbuffer_processors = 40 processor_wait_strategy = blocking ring_size = 65536 inputbuffer_ring_size = 65536 inputbuffer_processors = 2 inputbuffer_wait_strategy = blocking message_journal_enabled =truemessage_journal_dir = /var/lib/graylog-server/journal lb_recognition_period_seconds = 3 mongodb_uri = Mongo: / / graylog: [email protected]:27017,10.0. 105.76:27017,10.0. 105.96:27017 / graylog? replicaSet=graylog-rs mongodb_max_connections = 1000 mongodb_threads_allowed_to_block_multiplier = 5 content_packs_dir = /usr/share/graylog-server/contentpacks content_packs_auto_load = grok-patterns.json proxied_requests_thread_pool_size = 32 elasticsearch_hosts = http://10.0.105.74:9200, http://10.0.105.76:9200, http://10.0.105.96:9200 elasticsearch_discovery_enabled =true
Copy the code
Note how mongodb and ES are connected here. I have all deployed clusters here, so I write the connection of clusters
Mongodb_uri = mongo: / / graylog: [email protected]:27017,10.0. 105.76:27017,10.0. 105.96:27017 / graylog? ReplicaSet = graylog - rs elasticsearch_hosts = http://10.0.105.74:9200, http://10.0.105.76:9200, http://10.0.105.96:9200Copy the code
The graylog configuration is now complete on the graylog console, but the graylog configuration must first be issued to the graylog agent. Nginx can be used for the graylog proxy.
user nginx;
worker_processes 2;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 65535;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
include /etc/nginx/conf.d/*.conf;
upstream graylog_servers {
server 10.0.105.74:9000;
server 10.0.105.76:9000;
server 10.0.105.96:9000;
}
server {
listen 80 default_server;
server_nameSet up a domain name;location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Graylog-Server-URL http://$server_name/;
proxy_passhttp://graylog_servers; }}}Copy the code
After that, restart nginx and access it from the browser. The user name is admin and the password is the password created using SHA25 encryption
Graylog Access logs
Configuring the Input Source
System –> Inputs
Raw/Plaintext Kafka —> Lauch new input
Set kafka and ZooKeeper addresses, set topic names, and save
All states are running
Create indexes
System/indices
Set index information, index name, number of copies, number of fragments, expiration policy, and create an index policy
Create Streams
Add rules
Save, it is ok, and then go to the home page can see the log
conclusion
At this point, a complete deployment process is completed. Here, I will first explain how graylog is deployed, then explain how to use it, and then explore its other functions, such as extracting log fields, so stay tuned.