Log-pilot Collects K8S logs

At this point, our ELK has been deployed. Now we deploy log-Pilot, which collects container logs and pushes them to LogStash. Logstahs processes the logs and sends them to ES

A started container service, configured with a tag, is resolved by log-Pilot to fetch the container’s logs

features

Log-pilot Provides an automatic discovery mechanism. After a container tag is configured, the collection component detects the CheckPoint handle. Log-pilot traces Log file handles and automatically marks Log data. By adding a tag to the container, the tag is recorded in the log, and the data can be distinguished by the tag when the log is taken out. Dynamic configuration is effective. When the container is expanding or shrinking, the problem of log duplication and loss and log source tag can be handled automatically

lable

Aliyun. logs.$name = $PATH The variable name is the log name and contains only 0 to 9, a to z, a to z, and hyphens (-). The variable path is the log path to be collected. For example, /var/log/he.log and /var/log/*. Log are both valid values, but /var/log cannot be written only to directories. Stdout is a special value that represents standard output

Aliyun. logs.$name.format: indicates the log format. Currently, the following formats are supported: None: No format Plain text JSON: indicates a complete JSON string in each line

Aliyun. Logs. $name. Tags: Additional fields are reported in the format of k1=v1,k2=v2, and are separated by commas (,). For example, aliyun.logs.access.tags=”name=hello,stage=test”. If you use ElasticSearch as the log store, the target tag has a special meaning, indicating the index of ElasticSearch

use

  1. Configure a log-Pilot demonSet to publish to K8S so that each Node has a collection component
  2. The key to adding a lable to a Docker container is how to add the tag

PILOT_LOG_PREFIX: “aliyun,custom” By modifying this environment variable can change the prefix of lable, the default is aliyun(some versions do not apply)

Docker pull the log – pilot: 0.9.6 – filebeat

The deployment of yaml

---

apiVersion: extensions/v1beta1

kind: DaemonSet

metadata:

name: log-pilot

namespace: kube-system

labels:

k8s-app: log-pilot

    kubernetes.io/cluster-service: "true"

spec:

template:

metadata:

labels:

k8s-app: log-es

        kubernetes.io/cluster-service: "true"

version: v1.22

spec:

tolerations:

- key: node-role.kubernetes.io/master

effect: NoSchedule

serviceAccountName: dashboard-admin

containers:

- name: log-pilot

        Please refer to https://github.com/AliyunContainerService/log-pilot/releases # version

image: log-pilot:latest

resources:

limits:

memory: 200Mi

requests:

cpu: 100m

memory: 200Mi

env:

- name: "NODE_NAME"

valueFrom:

fieldRef:

fieldPath: spec.nodeName

- name: "LOGGING_OUTPUT"

value: "logstash"

- name: "LOGSTASH_HOST"

value: "10.90. 7.0.x.x"

- name: "LOGSTASH_PORT"

value: "5044"

- name: "LOGSTASH_LOADBALANCE"

value: "true"

          #- name: "FILEBEAT_OUTPUT"

          # value: "elasticsearch"

          #- name: "ELASTICSEARCH_HOST"

          # value: "elasticsearch"

          #- name: "ELASTICSEARCH_PORT"

          # value: "9200"

          #- name: "ELASTICSEARCH_USER"

          # value: "elastic"

          #- name: "ELASTICSEARCH_PASSWORD"

          # value: "changeme"

volumeMounts:

- name: sock

mountPath: /var/run/docker.sock

- name: root

mountPath: /host

readOnly: true

- name: varlib

mountPath: /var/lib/filebeat

- name: varlog

mountPath: /var/log/filebeat

securityContext:

capabilities:

add:

            - SYS_ADMIN

terminationGracePeriodSeconds: 30

volumes:

- name: sock

hostPath:

path: /var/run/docker.sock

- name: root

hostPath:

path: /

- name: varlib

hostPath:

path: /var/lib/filebeat

type: DirectoryOrCreate

- name: varlog

hostPath:

path: /var/log/filebeat

type: DirectoryOrCreate

Copy the code

Deployment Injection environment variable configuration, assuming the application name is monitor-Center

- name: aliyun_logs_monitor-center-stdout

  # Acquisition console

value: "stdout"

- name: aliyun_logs_monitor-center-tomcat

  # collect the specified directory

value: "/usr/local/tomcat/logs/*.log"

- name: aliyun_logs_monitor-center-netcore

  # collect the specified directory

value: "/app/logs/*.log"

- name: aliyun_logs_monitor-center-java

  # collect the specified directory

value: "/logs/*.log"

- name: aliyun_logs_monitor-center-stdout_tags

  # tag the Aliyun_logs_monitor-center-stdout acquisition console configuration, similar below

value: "app=monitor-center,lang=all,sourceType=stdout"

- name: aliyun_logs_monitor-center-tomcat_tags

value: "app=monitor-center,lang=java,sourceType=log"

- name: aliyun_logs_monitor-center-netcore_tags

value: "app=monitor-center,lang=net,sourceType=log"

- name: aliyun_logs_monitor-center-java_tags

value: "app=monitor-center,lang=java,sourceType=log"

Copy the code

Check whether fileBeat is correctly configured

kubectl -n kube-system get pod | grep log-pilot

kubectl exec -it log-pilot-nspdv sh -n kube-system

cat /etc/filebeat/filebeat.yml

Other knowledge points

Common log collection components

  1. filebeat
  2. Logstash
  3. logpilot
  4. fluentd

ElasticSearch Curator

Es7.x is a python development tool that makes it easier to use es, without sending HTTP requests directly. However, this tool is also outdated in higher versions

elastalert

Es extensions. You can process specific matching logs and generate alarms

Elastic Beats

Beats means lightweight and is the source of data, such as log files (Filebeat), network data (Packetbeat), and server metrics (Metricbeat).

So the community came up with the concept of ELKB, and we used log-Pilot to collect container data, which was already an ELKB idea

https://www.cnblogs.com/sanduzxcvbnm/p/12076383.html

About the type

X version can create multiple Types. 6. X version can create only one Type

Es is index-based and does not require type(a table in a relational database) to speed up queries

About the node

Master node, data node, pre-processing node (coordination node). Node type can display the specified, the default is served for the various functions, the master node is chosen to the master node is responsible for synchronization cluster state, coordinating node is responsible for forwarding the request, if coordination node was also involved in data processing, so coordination node load is too high, cannot forward request, may affect the overall performance under the mass nodes, such as more than 10 sets, Each node can be dedicated to the responsibility of the pre-processing node because there is no need to store data, the CPU memory requirements are not very high

It is better to use Nginx load balancing, polling nodes to process requests rather than sending requests to one node each time

About node rebalancing

A sudden outage does not need to trigger cluster rebalancing. You can either shut it down or set the trigger delay

About automatic index creation

If you do not allow automatic index creation, ELK logs pushed to ES will not be visible in the front end, so you need to create indexes manually. This function can be turned on or off by modifying the configuration

The default heap

Es default heap maximum and minimum is 2GB, so if it is for testing not knowing the heap, it may fail to start importing docker because the default value is too large

Docker ES to modify

If you want to adjust ES JVM parameters, step

  1. Let’s close the container, rm container
  2. Modify the configuration file to comment out security-related comments, otherwise the container will not start
  3. Modify the JVM to start the container again
  4. Copy the keywords related files to the container
  5. Modify the configuration file to cancel the comment
  6. Restart the container

Because security checks were used, the keywords file disappeared after the container was destroyed, and notice that the plug-in was also gone

Then stored data is mapped to a local, regardless of when the cluster after losing nodes, will rebalance the index to the other node, when the new cluster nodes, will again balance data stored in the local data in a node failure, failure was node rejoin the data are written to the new, in short in the docker mode, do not need to pay attention to the data file too

This is actually redeployment, but there is also a non-canonical way to operate without destroying containers, which is less stable and is recommended for the JVM to plan from the start. If you need to modify it, you should first use non-standard operation, that is, modify the file in docker Container, which is a bit complicated. In this way, the data still exists. If there are many nodes in the cluster and there are duplicate backups, the container can be destroyed directly so that the data is not lost

The problem summary

Record the solutions to common problems

Search. Max_buckets is too small

trying to create too many buckets. must be less than or equal to: [100000] but was [100001]. this limit can be set by changing the [search.max_buckets] cluster level setting.
Copy the code

The above error may be caused by the setting of search. Max_buckets, which is temporarily modified transient

curl -H 'Content-Type: application/json' ip:port/_setting/cluster -d '{ "transient": { "search.max_buckets": 217483647}} 'Copy the code

With Kibana, Persistent is a persistent configuration

PUT /_cluster/settings
{
  "persistent": {
    "search.max_buckets": 217483647
  }
}
Copy the code

Or change the configuration file directly

The official reference https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket.html

The number of fragments is too small. Procedure

The number of shards needs to be adjusted. 7.5 The default number of shards is 2000. If the number exceeds this value, shards cannot be created in the cluster

PUT /_cluster/settings
{
  "persistent": {
    "cluster": {
      "max_shards_per_node":10000
    }
  }
}
Copy the code

The write line is too small. Procedure

After the configuration file is modified, the write thread size is changed: thread_pool.write. Queue_size: 1000

Modify the default put back value entry

Max_result_window: 1000000 By default, only 10000 can be returned. This configuration may need to be modified if the call chain is too long (Skywalking)

ELK ELK architecture reference https://www.one-tab.com/page/tto_XdDeQlS44BY-ziLvKg structures, https://www.one-tab.com/page/Fb3B3qd2Q9yR9W92dZ2pYQ