Prepare the environment
Deploy the software and environment
- Redis, mysql, Grafana, Ansible, Exporter, Prometheus
Setting the host Name
hostnamectl set-hostname prome-master01
Copy the code
Set the time zone
Timedatectl [root@prometheus_master01 ~]# timedatectl Local time: 62021-03-27 22:39:41 CST Universal time: 6 21-03-27 14:39:41 UTC RTC Time: 6 21-03-27 14:39:41 Time zone: Asia/Shanghai (CST, +0800) NTP Enabled: yes NTP synchronized: yes RTC in local TZ: no DST active: n/a timedatectl set-timezone Asia/ShanghaiCopy the code
Disable the firewall Selinux
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
setenforce 0
sed -i '/^SELINUX/s/enforcing/disabled/' /etc/selinux/config
getenforce
Copy the code
Example Disable the SSHD DNS reverse solution
sed -i 's/^#UseDNS yes/UseDNS no/' /etc/ssh/sshd_config
systemctl restart sshd
Copy the code
Build phase
Set up the domestic YUM source
Mkdir/TMP /yum_repo_bk /bin/mv -f /etc/yum_repo_bk /* / TMP /yum_repo_bk # http://mirrors.aliyun.com/repo/Centos-7.repo # epel source wget - O/etc/yum repos. D/epel - 7. Repo https://mirrors.aliyun.com/repo/epel-7.repo yum makecacheCopy the code
Installing Required Tools
# rzsz
yum -y install lrzsz yum-utils
Copy the code
Prepare data directories
Mkdir -pv /opt/ TGZS # mkdir -pv /opt/appCopy the code
Set the ulimit for the history file
cat <<EOF >> /etc/profile
export HISTFILESIZE=
export HISTSIZE=
EOF
source /etc/profile
Copy the code
Download the latest version of Prometheus
# # # https://github.com/prometheus/prometheus/releases/tag/v2.25.2 address Prometheus wget - O / opt/TGZS Prometheus - 2.25.2. Linux - amd64. Tar. Gz https://github.com/prometheus/prometheus/releases/download/v2.25.2/prometheus-2.25.2.linux-amd64.tar.gz # node_exporter Wget - O/opt/TGZS/node_exporter - 1.1.2. Linux - amd64. Tar. Gz https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz # Alertmanager wget - O/opt/TGZS/alertmanager - 0.21.0. Linux - amd64. Tar. Gz https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz # Pushgateway wget - O/opt/TGZS/pushgateway - 1.4.0. Linux - amd64. Tar. Gz https://github.com/prometheus/pushgateway/releases/download/v1.4.0/pushgateway-1.4.0.linux-amd64.tar.gz # The process - exporter wget - O/opt/TGZS/process - exporter - 0.7.5. Linux - amd64. Tar. Gz https://github.com/ncabatoff/process-exporter/releases/download/v0.7.5/process-exporter-0.7.5.linux-amd64.tar.gz # Blackbox_exporter wget - O/opt/TGZS/blackbox_exporter - 0.18.0. Linux - amd64. Tar. Gz https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz # Redis_exporter wget - O/opt/TGZS/redis_exporter - v1.20.0. Linux - amd64. Tar. Gz https://github.com/oliver006/redis_exporter/releases/download/v1.20.0/redis_exporter-v1.20.0.linux-amd64.tar.gz # Mysql_exporter wget - O/opt/TGZS/mysqld_exporter - 0.12.1. Linux - amd64. Tar. Gz https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gzCopy the code
Install mysql and configure it
# download installed mysql source package wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm # mysql source yum localinstall Mysql57 - community - release - el7-8 noarch. RPM - y # check mysql source successful installation yum repolist enabled | grep "mysql. * - community. * #" MySQL > install mysql-community-server -y # yum install mysql-community-server -y # yum install mysql-community-server -y # yum install mysql-community-server -y # yum Mysqld: mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld/mysqld A default password is generated for root in the /var/log/mysqld.log file. Find the default password of root and log in to mysql to change it: Mysql -uroot -p # mysql5.7 mysql -uroot -p # mysql5.7 The default password check policy requires that the password must contain uppercase and lowercase letters, digits, special characters, #, and a length of at least eight characters. Otherwise ERROR 1819 (HY000) is displayed: Your password does not satisfy the current policy requirements error # If you do not need password policy, add the following configuration to my.cnf file to disable it: Echo -e "validate_password = off\ncharacter_set_server= UTf8 \ninit_connect='SET NAMES ' Utf8 '\nskip-name-resolve\n" >> /etc/my.cnf systemctl restart mysqld mysql -uroot -p # identified by '123123'; grant all privileges on *.* to root@'%' identified by '123123' with grant option; flush privileges;Copy the code
Install Redis and configuration
yum -y install redis
Copy the code
Compile and install Redis-6.2.1
- Document redis. IO/download
Yum install -y GCC GCC -c++ TCL wget -o /opt/ TGZS /redis-6.2.1.tar.gz https://download.redis.io/releases/redis-6.2.1.tar.gz CD/opt/TGZS / # extract redis tar xf redis - 6.2.1. Tar. Gz # into the unzipped directory CD Redis-6.2.1 # allocator: if there is an environment variable MALLOC, it will be used to create redis. # And liBC is not the default allocator, the default is Jemalloc, because jemalloc has been proven to have fewer fragmentation problems than LIBC. # But if you have no jemalloc and only libc of course make error. So add this parameter and run the following command: Make MALLOC=libc -j 20 # make -j 20 # Mkdir -p /usr/local/redis # make PREFIX=/usr/local/redis # / etc/profile export PATH = $PATH: / usr/local/redis/bin source/etc/profile # copy the default configuration file to the/etc egrep -v "^ $| #" redis. Conf > Redis_sample. conf # change configuration file listening IP address to 0.0.0.0, Sed -i s/bind\ 127.0.0.1/bind\ 0.0.0.0/g redis_sample.conf sed -i s/daemonize\ no/daemonize\ yes/g redis_sample.conf /bin/cp -f redis_sample.conf /etc/redis_6379.conf /bin/cp -f redis_sample.conf /etc/redis_6479.conf Sed -i s@logfile\ ""@logfile\ "/opt/logs/redis_6379.log"@g /etc/redis_6379.conf sed -i s@logfile\ ""@logfile\ Conf # sed -i s@dir\./@dir\ /var/lib/redis_6379@g /etc/redis_6379.conf sed -i s@dir\./@dir\ /var/lib/redis_6479@g /etc/redis_6379.conf 6379/g' /etc/redis_6379.conf sed -i 's/port 6379/port 6479/g' /etc/redis_6479.conf mkdir /var/lib/redis_6379 mkdir /var/lib/redis_6479 mkdir /opt/logs cat <<EOF > /etc/systemd/system/redis_6379.service [Unit] Description=The redis-server Process Manager After=syslog.target network.target [Service] Type=forking ExecStart=/usr/local/redis/bin/redis-server /etc/redis_6379.conf #ExecStop=/usr/local/redis/bin/redis-shutdown [Install] WantedBy=multi-user.target EOF cat <<EOF > /etc/systemd/system/redis_6479.service [Unit] Description=The redis-server Process Manager After=syslog.target network.target [Service] Type=forking ExecStart=/usr/local/redis/bin/redis-server /etc/redis_6479.conf #ExecStop=/usr/local/redis/bin/redis-shutdown [Install] WantedBy=multi-user.target EOF systemctl Daemon -reload # redis systemctl enable redis_6379 systemctl enable redis_6479 Systemctl status redis_6379 systemctl status redis_6479Copy the code
Install grafana and configure it
RPM grafana 7 installation
# address https://grafana.com/grafana/download wget - O/opt/TGZS/grafana - 7.5.1-1. X86_64. RPM https://dl.grafana.com/oss/release/grafana-7.5.1-1.x86_64.rpm sudo yum install grafana 7.5.1-1. X86_64. RPMCopy the code
Create database in mysql
CREATE DATABASE IF NOT EXISTS grafana DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Copy the code
Modify the configuration file and fill in the mysql path
Sqllite3 name: grafana user: root password: 123123Copy the code
Start the service
systemctl start grafana-server
systemctl enable grafana-server
systemctl status grafana-server
Copy the code
Check whether logs report errors
tail -f /var/log/grafana/grafana.log
Copy the code
Laptop Settings hard solution
# windows
C:\Windows\System32\drivers\etc\hosts
192.168.0.112 grafana.prome.me
Copy the code
Laptop Browser access
http://grafana.prome.me:3000/?orgId=1 default user password: admin/adminCopy the code
Google Chrome can’t edit the question
- Issue github.com/grafana/gra…
Install Ansible and press Install Node-export in batches
Write the host name of the node to hosts
Echo "192.168.0.112 prome-master01" >> /etc/hosts echo "192.168.0.113 prome-node01" >> /etc/hostsCopy the code
The SSH key is generated on the master and copied to the node
Ssh-keygen ssh-copy-id prome-node01 ssh-copy-id prome-master01 #Copy the code
Install Ansible on the master
/etc/ansible/ansible. CFG ssh_args = -c -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=noCopy the code
Playbook execution requires setting up machine files
Cat <<EOF > /opt/ TGZS /host_file prome-master01 prome-node01 EOF # ansible-i host_file all -m Ping tests whether the connection is normalCopy the code
Set the syslog and Logrotate services
ansible-playbook -i host_file init_syslog_logrotate.yaml
Copy the code
Write ansible publishing service scripts
Ansible-playbook -I host_file service_deploy.yaml -e "TGZ = node_exporters -1.1.2.linux-amd64.tar.gz" -e "app= node_exporters"Copy the code
Check the node_exporter service status
ansible -i host_file all -m shell -a " ps -ef |grep node_exporter|grep -v grep "
Copy the code
Browser 9100/metrics
node01.prome.me:9100/metrics
master01.prome.me:9100/metrics
Copy the code
Install Prometheus and configure it
Deploy Prometheus using Ansible
Yaml -e "TGZ = Prometheus -2.25.2.linux-amd64.tar.gz" -e "app= Prometheus" ansible-playbook -I host_file service_deploy.yaml -e "TGZ = Prometheus -2.25.2.linux-amd64.tar.gz" -e "app= Prometheus"Copy the code
To view the page
http://master01.prome.me:9090/
Copy the code
Prometheus configuration file parsing
Scrape_interval scrape_interval Evaluation_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 10s # query_log_file: /opt/logs/prometheus_query_log # global tag group # All data collected through this instance will be superimposed with external_labels: Account: 'huawei-main' region: 'beijng-01' # Alertmanager Information segment Alerting: AlertManagers: - Scheme: HTTP static_configs: -targets: - "localhost:9093" # Rule_files: - / etc/Prometheus/rules/record yml - / etc/Prometheus/rules/alert. Yml # collection configuration section scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: [' localhost: 9090] # remote query section remote_read: # Prometheus - url: http://prometheus/v1/read read_recent: true # m3db - url: "Http://m3coordinator-read:7201/api/v1/prom/remote/read" read_recent: true # remote written paragraph remote_write: - url: "http://m3coordinator-write:7201/api/v1/prom/remote/write" queue_config: capacity: 10000 max_samples_per_send: 60000 write_relabel_configs: - source_labels: [__name__] separator: ; # key tag prefix matching to the drop of regex: '(kubelet_ | apiserver_ | container_fs_). *' replacement: $1 action: the dropCopy the code
- So Prometheus instances can be used for the following purposes
Corresponding configuration segment | use |
---|---|
Collection configuration section | As a collector, data is saved locally |
Collection configuration segment + remote write segment | As a collector + transmitter, data is stored locally + remotely |
Remote query segment | Perform a query function to query remote storage data |
Collection configuration segment + remote query segment | Perform collector + query to query local data + remote storage data |
Collection configuration segment + Alertmanager information segment + Alarm configuration file segment | Perform the collector + alarm trigger to query local data and generate alarms and send them to the Alertmanager |
Remote query segment + Alertmanager information segment + Alarm configuration file segment | Perform remote alarm triggers to query remote data and send alarms to the Alertmanager |
Remote query segment + remote write segment + pre-aggregate profile segment | Do the pre-aggregation indicator, and write the result set indicator to the remote storage |
Prepare the Prometheus configuration file and configure to export two node_exporters
global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s alerting: alertmanagers: - scheme: http timeout: 10s api_version: v1 static_configs: - targets: [] scrape_configs: - job_name: prometheus honor_timestamps: true scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: HTTP Static_configs: -targets: -192.168.26.112:9100-192.168.26.113:9100Copy the code
Hot update configuration files
--web.enable-lifecycle curl -x POST http://localhost:9090/-/reloadCopy the code
Check the targets Up status on the page
- To access the page master01. Prome. Me: 9090 / the targets
Explanation targets Page
- Job Grouping
- Endpoint instance address
- State Indicates whether the collection is successful
- The label tag set
- Last Scrape Indicates the time between the Last Scrape
- Scrape Duration Duration of the last collection
- Error Collection Error
Obtain targets details from the API
- run
008_get_targets_from_prome.py
Status: normal num: 1/2 endpoint: http://172.20.70.205:9100/metrics state: the up labels: {' instance ':' 192.168.26.112:9100 ', 'job' : 'Prometheus'} lastScrape: the 2021-03-29 T18:20:04. 304025213 + 08:00 lastScrapeDuration: lastError 0.011969003: Status: normal num: 2/2 endpoint: http://172.20.70.215:9100/metrics state: the up labels: {' instance ':' 192.168.26.113:9100 ', 'job' : 'Prometheus'} lastScrape: the 2021-03-29 T18:20:06. 845862504 + 08:00 lastScrapeDuration: lastError 0.012705335:Copy the code
- Fill in some random wrong target and test it out like
abc:9100
Status: abnormal num: 1/3 endpoint: http://abc:9100/metrics state: down labels: {' instance ':' ABC: 9100 ', 'job' : 'Prometheus'} lastScrape: the 2021-03-29 T18:24:08. 365229831 + 08:00 lastScrapeDuration: lastError 0.487732313: the Get "http://abc:9100/metrics": dial TCP: lookup ABC on 114.114.114.114:53: No to the host state: normal num: two-thirds of the endpoint: http://192.168.26.112:9100/metrics state: the up labels: {' instance: '192.168.26.112:9100', 'job' : 'Prometheus'} lastScrape: the 2021-03-29 T18: earth. 304044469 + 08:00 lastScrapeDuration: lastError 0.012483866: Status: normal num: 3/3 endpoint: http://192.168.26.113:9100/metrics state: the up labels: {' instance ':' 192.168.26.113:9100 ', 'job' : 'Prometheus'} lastScrape: the 2021-03-29 T18: num. + 845860017 08:00 lastScrapeDuration: lastError 0.010381262:Copy the code
- Can be used to calculate the target collection success rate
- up metrics
Collect indicators of Prometheus itself
- 192.168.26.112:9090-192.168.26.113:9090Copy the code