This is the 9th day of my participation in the More text Challenge. For details, see more text Challenge

preface

There is no end to learning, any form of learning, finally must have output is likely to become their own knowledge system and knowledge accumulation. This article is also an output of my learning process of ELK. I summarize it in accordance with my usual thinking (3W1H) for learning new technologies.

What is ELK?

ELK is a complete log collection, analysis, and presentation solution provided by ElasticSearch. It is the acronym of ElasticSearch, Logstash, and Kibana. In addition, FileBeat is one of the most widely used log collection software. Compared with Logstash, it is more lightweight and consumes less resources. This article will take Logstash as an example.

This section describes the ELK components

ElasticSearch is a near real-time (NRT) distributed search and analysis engine that can be used for full text search, structured search, and analysis. It is a search engine based on the full-text search engine Apache Lucene, written in the Java language. Logstash is a near-real-time (NRT) data collection, filtering, and analysis engine that collects, parses, filters, and sends data to ES. Kibana is a visual Web platform that provides analysis and presentation for ElasticSearch. It finds, interacts with, and generates dimensional tables, graphs, and dashboards in ElasticSearch’s index.

ELK architecture diagram

Why use ELK?

With the continuous upgrading of our system architecture, from single to distributed, micro-service, grid system, etc., the amount of logs generated by user access is also increasing. We are in urgent need of a platform that can query and analyze logs quickly and accurately.

A complete log analysis platform should include the following features:

Collection – can collect log data from multiple sources (system error log + service access log) transmission – can stable transfer log data to log platform storage – how to store log data analysis – can support UI analysis warning – can provide error report The emergence of ELK provides us with a complete solution, which is open source software, and can be used together, perfectly connected, and effectively meet the needs of many applications. Is currently a mainstream log system. ELK has also traditionally served as an open source alternative to Splunk, the leader in log analysis.

Where is ELK used? The core application scenario of ELK is to collect, analyze, and display logs of large hardware and software systems. In recent years, with the rapid increase of Internet users, various scenarios have been further explored. Just in recent years, big data is very popular, and everyone is using a variety of big data products. We found that Elasticsearch has the ability to process huge amounts of data. And it’s more convenient and faster than Hadoop. Therefore, ELK is also used in other scenarios, such as the SIEM field. Many companies use ELK for security data analysis, such as enterprise intrusion detection, abnormal traffic analysis, and user behavior analysis.

How to set up ELK?

We built it from scratch, based on a field project.

Practical project Introduction

Search for, analyze, and display service system logs (system logs + user access logs) in real time.

Actual project analysis

Service system logs are stored in the log table of the Oracle database. You need to use Logstash to collect log table data in Oralce. Send data collected by Logstash to ElasticSearch. Data in ES was queried, analyzed and presented through Kibana. Install ElasticSearch from the actual ElasticSearch project. Download ES from the official website. This document uses ELASticSearch-6.4.3.tar. gz as an example

Decompress tar -zxvf Elasticsearch-6.4.3.tar. gz to configure the ES core configuration file CD /usr/local/Elasticsearch-6.4.3/config vim Elasticsearch.yml The configuration is as follows:

# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: # cluster.name: zkc-elasticsearch # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: # node.name: node-0 # # Add custom attributes to the node: # #node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): # path.data: /usr/local/elasticsearch-6.4.3/data # # Path to log files: # Path. / usr/local/elasticsearch - 6.4.3 / logs # # -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the Memory -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- #  # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # Set the bind address to a specific IP (IPv4 or IPv6): # net.host: 0.0.0.0 # # Set a custom port for HTTP: # #http.port: 9200 http.ors. Enabled: true http.cors.allow-origin : "*" # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when new node is started: # The default list of hosts is [" 127.0.0.1 ", "[: : 1]]" # # discovery. Zen. Ping. Unicast. Hosts: ["host1", "host2"] # # Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 +  1): # #discovery.zen.minimum_master_nodes: # #cluster.initial_master_nodes: ["node-0"] # For more information, consult the zen discovery module documentation. # # ---------------------------------- Gateway ----------------------------------- # # Block initial recovery after a full cluster restart until N nodes are started: # #gateway.recover_after_nodes: 3 # # For more information, consult the gateway module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: trueCopy the code
安装IK中文分析器,ES默认分析器无法对中文进行分析。去GitHub上下载IK分析器,本文中以elasticsearch-analysis-ik-6.4.3.zip为例
解压缩IK
unzip elasticsearch-analysis-ik-6.4.3.zip -d /usr/local/elasticsearch-6.4.3/plugins/ik/
ES不能用root用户启动,所以需要创建普通用户并附权限
useradd esuser
chown -R esuser /usr/local/elasticsearch-6.4.3/
我在虚拟机测试,需要配置es的JVM参数,内存够的可以忽略
vim jvm.options
配置如下:
-Xms128M
-Xmx128M
配置es其他启动参数
vim /etc/security/limits.conf
配置如下:
# /etc/security/limits.conf
#
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
#
#Also note that configuration files in /etc/security/limits.d directory,
#which are read in alphabetical order, override the settings in this
#file in case the domain is the same or more specific.
#That means for example that setting a limit for wildcard domain here
#can be overriden with a wildcard setting in a config file in the
#subdirectory, but a user specific setting here can be overriden only
#with a user specific setting in the subdirectory.
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4

* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096


# End of file
               
vim /etc/sysctl.conf
配置如下:

# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
#
vm.max_map_count=262145

配置生效
sysctl -p
8. 切换用户并启动ES
su esuser
cd /usr/local/elasticsearch-6.4.3/bin/
./elasticsearch
9. 启动后查看控制台,并访问ES http://192.168.184.145:9200


搭建Logstash
去官网下载压缩包,本文以logstash-6.4.3.tar.gz为例

解压缩
tar -zxvf logstash-6.4.3.tar.gz
mv logstash-6.4.3 /usr/local/
创建同步文件夹,后面会存入同步相关jar包和配置文件
mkdir sync
创建并编辑同步配置文件
cd sync
vim logstash-db-sync.conf
配置如下:
input{
  jdbc{
    # 设置 MySql/MariaDB 数据库url以及数据库名称
    jdbc_connection_string => "jdbc:oracle:thin:@172.16.4.29:1521:urpdb"
    # 用户名和密码
    jdbc_user => "USR_JWJC_DEV"
    jdbc_password => "JWJCDEV1234"
    # 数据库驱动所在位置,可以是绝对路径或者相对路径
    jdbc_driver_library => "/usr/local/logstash-6.4.3/sync/ojdbc8-12.2.0.1.jar"
    # 驱动类名
    jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
    # 开启分页
    jdbc_paging_enabled => "true"
    # 分页数量
    jdbc_page_size => "1000"
    # 执行的sql文件路径
    statement_filepath => "/usr/local/logstash-6.4.3/sync/jwf_log.sql"
    # 设置任务间隔 含义:分 时 天 月 年 全部*默认每分钟跑一次
    schedule => "* * * * *"
    # 索引类型
    type => "_doc"
    # 是否开启记录上次追踪的结果
    use_column_value => true
    # 记录上次追踪的结果值
    last_run_metadata_path => "/usr/local/logstash-6.4.3/sync/track_time"
    # 追踪字段名称
    tracking_column => "ID"
    # 追踪字段类型
    tracking_column_type => "numeric"
    # 是否清除追踪记录
    clean_run => false
    # 数据库字段名称大写转小写
    lowercase_column_names => false
  }
}
output{
  # es配置
  elasticsearch{
     # es地址
     hosts => ["192.168.184.145:9200"]
     # 索引库名称
     index => "jwf-logs"
     # 设置索引ID
     document_id => "%{ID}"
  }
  # 日志输出
  stdout{
    codec => json_lines
  }

}

拷贝配置中涉及的数据库驱动jar包,根据实际数据库来

编辑用于同步的sql
vim jwf_log.sql
SELECT * from T_SYSTEM_REQUEST_LOG WHERE ID > :sql_last_value
1
启动logstash并观察es索引和数据是否正确
cd bin/
./logstash -f /usr/local/logstash-6.4.3/sync/logstash-db-sync.conf
通过es-head观察或者直接es的rest api查询是否存在索引jwf-logs

搭建Kibana
去官网下载压缩包,本案例中以kibana-6.4.3-linux-x86_64.tar.gz为例。

解压缩
tar -zxvf kibana-6.4.3-linux-x86_64.tar.gz
配置kibana配置文件
cd /usr/local/kibana-6.4.3-linux-x86_64/config/
vim kibana.yml
配置如下:
默认kibana只能连接本机的ES
# Kibana is served by a back end server. This setting specifies the port to use.
#server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "192.168.184.145"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
elasticsearch.url: "http://192.168.184.145:9200"

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"
Copy the code

Start the Kibana

CD bin/./kibana Go to the Kibana homepage and configure the index rule to be queried. After the index rule is created, you can access Discover to query the index records matched by the ruleCopy the code

Note:

The versions of the components involved in ELK must be consistent; otherwise, matching errors may occur.