This is the 16th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

Items that may need to be queried after a problem occurs with the Hexagon database

1. Hardware information

1.1 the CPU information

Run the lscpu command to obtain details about the CPU. The result is as follows

[root@registry ~]# lscpu
Copy the code

1.2 Memory Information

Run the free -m command to obtain memory information. The command output is as follows:

[root@registry ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           3789         306        2512         124         970        3081
Swap:             0           0           0

Copy the code

1.3 Disk Information

You can run the df command to obtain the disk space and inode usage. As shown below.

#Example View disk space information
[root@registry ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   36G   21G   15G  59% /
devtmpfs                 1.9G     0  1.9G   0% /dev
#View the inode usage[root@registry ~]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/centos-root 18411520 211354 18200166  2% / devtmpfs 480814 359 480455 1% /devCopy the code

1.4 Directory Permissions

If the permission of the database directory is incorrect, the database may fail to run properly or inexplicable problems may occur. You can run the ls command to obtain the permission of the database directory.

Generally, run ls -L $PGHOME to obtain the information about the files in the home directory of the Hangao database. If the environment variables are not set or do not take effect, go to the opt directory to find the installation directory of the database, enter the installation directory, and run ls -L. As shown below.

[highgo@registry ~]$ ls -l $PGHOME
total 24
drwxr-xr-x  2 highgo highgo 4096 Feb 17 13:53 bin
drwxr-xr-x  2 highgo highgo   28 Feb 17 13:53 conf
drwx------ 21 highgo highgo 4096 Apr 14 16:15 data
drwxr-xr-x  4 highgo highgo   32 Feb 17 13:53 diag
drwxr-xr-x  3 highgo highgo  135 Feb 17 15:06 etc
drwxr-xr-x  2 highgo highgo   62 Feb 17 13:53 icon
drwxr-xr-x  4 highgo highgo 4096 Feb 17 13:53 include
drwxr-xr-x  4 highgo highgo 4096 Feb 17 13:53 lib
drwxr-xr-x  5 highgo highgo   49 Feb 17 13:53 share
#Go to the installation directory[highgo@registry ~]$CD /opt/HighGo5.6.5/ [highgo@registry HighGo5.6.5]$ls -l total 24 drwxr-xr-x 2 highGo highgo 4096 Feb 17 13:53 bin drwxr-xr-x 2 highgo highgo 28 Feb 17 13:53 conf drwx------ 21 highgo highgo 4096 Apr 14 16:15 data drwxr-xr-x 4 highgo highgo 32 Feb 17 13:53 diag drwxr-xr-x 3 highgo highgo 135 Feb 17 15:06 etc drwxr-xr-x 2 highgo highgo 62 Feb 17 13:53 icon drwxr-xr-x 4 highgo highgo 4096 Feb 17 13:53 include drwxr-xr-x 4 highgo highgo 4096 Feb 17 13:53 lib drwxr-xr-x 5 highgo highgo 49 Feb 17 13:53 shareCopy the code

2. Operating system information

When a database problem occurs, the operating system information helps engineers determine whether the problem is caused by the database itself or by external causes.

2.1 Operating system version

You can obtain the operating system version by directly reading the version file under etc. As shown below.

[root@registry ~]# cat /etc/*-release CentOS Linux Release 7.6.1810 (Core) NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE=" RHEL Fedora "VERSION_ID="7" PRETTY_NAME=" centos Linux 7 (Core)" centos Linux Release 7.6.1810 (Core)Copy the code

2.2 Operating System Logs

Operating system logs help you determine whether the database problems are related to the operating system or hardware. The operating system log path is /var/log/messages. You can run the cat command to print logs. Because there are many logs, you are advised to copy log files directly.

[root@registry ~]# cat /var/log/messages
Apr 15 03:20:02 registry systemd: Started Session 259 of user root.
Apr 15 03:50:01 registry systemd: Started Session 262 of user root.
Apr 15 04:30:01 registry systemd: Started Session 267 of user root.
Copy the code

2.3 Obtaining OS Kernel Parameters

In some cases, the setting of operating system kernel parameters may affect the normal running of the database. Kernel parameters need to provide two parts of information: the current running parameters and the parameters written in sysctl.conf. As follows:

#If the content is large, use the following method to save the information to/TMP /sysctl. TXT
[root@registry ~]# sysctl -a >/tmp/sysctl.txt
#Gets the current parameter Settings in sysctl.conf
[root@registry ~]# cat /etc/sysctl.conf
Copy the code

2.4 Obtaining firewall Status and Rules

Incorrect firewall Settings may block the database connection. When the database is running properly but the client cannot connect to the database, you need to check the firewall status and rules. As follows:

[root@registry ~]# firewall-cmd --state
running
[root@registry ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp0s3 enp0s8
  sources: 
  services: ssh dhcpv6-client
  ports: 
Copy the code

3. Database information

The database version, parameters, logs, control files, and license information can be used to locate database problems. The following information is required for single-node replication, cluster replication, and stream replication. The following information is required for the active and standby servers of cluster replication and stream replication.

3.1 Database Version

Log in to the database using PSQL or other clients on the server and execute the following SQL statements to obtain the version information of the database. The SQL statements used in the following statements can be executed on any client that can be connected to the database. As follows:

[root@registry ~]# PSQL -u sysdba highgo PSQL (4.3.4.7) Type "help" for help.highgo =# select kernel_version(); Kernel_version -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- HighGo Database V4.3 Release 4.3.4.7-64 - bit Production (1 row)Copy the code

3.2 Database startup information

Database startup time

To obtain the database startup time, run the following statement:

highgo=# select * from pg_postmaster_start_time(); Pg_postmaster_start_time -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - the 2020-04-20 09:27:11. 278359 + 8 (1 row)Copy the code

Configuration file loading time

Use the following statement to get the load time of the database configuration file.

highgo=# select pg_conf_load_time(); Pg_conf_load_time -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - the 2020-04-20 09:27:10. 493388 + 8 (1 row)Copy the code

3.3 Database Parameters

Database parameters can help engineers locate problems caused by database parameter Settings. The parameters required by Hangao database include: parameters currently used by the database and parameters used in parameter files.

Run the “show all;” command to run the current parameters of the database. You can obtain it from any client that can connect to the database or save it to a file in the following ways.

[root@registry ~]# psql -U sysdba highgo -c "show all;" > /tmp/hgdb.conf
Copy the code

Conf, postgresql.auto. Conf, and pg_hba.conf files are stored in the data directory of the database.

3.4 Database Logs

Database logs can provide detailed information for troubleshooting database projects. Generally, database logs are stored in the DATA directory. If you are not sure where to store the logs and which logs are required, use the following methods to confirm.

#The log_directory parameter shows the location where logs are stored. The location is a relative path with data as the origin[root@registry ~]# PSQL -u sysdba highgo PSQL (4.3.4.7) Type "help" for help.highgo =# show log_directory; log_directory --------------- log (1 row)#View the log format. The parameter after % indicates how long the log is stored. The common suffixes are w (1 week), D (1 month), j (1 year),
highgo=# show log_filename ;
   log_filename    
-------------------
 hgdblog-%d.log
(1 row)
#View the log name that is currently in use
highgo=# select pg_current_logfile();
  pg_current_logfile   
-----------------------
 log/postgresql-16.csv
(1 row)
Copy the code

You can only copy the logs within the fault period. If you cannot confirm the time or the same fault occurs multiple times, you are advised to copy all logs in the log directory.

3.5 Control Files

Control file One of the important files in the database to help determine the cause of database problems. The control file cannot be read directly. You need to run the pg_conftroldata command to obtain the control file. Because there is a large amount of content, you can write the content to the file in redirection mode. As follows:

[root@registry ~]# pg_controldata >/tmp/hgdb_control.txt
Copy the code

3.6 license information

After the license expires, the database automatically shuts down. In this case, you can run the check_lic command to obtain license information. As follows:

[root@registry ~]# check_lic

############## HIGHGODB LICENSE ###################License Status :Normal License Validity :Unlimited Database Version:Highgo Database 4.7.6 Official Edition## # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Copy the code

4. Information to be collected for cluster and stream replication

When the hangao database cluster or flow replication is faulty, in addition to collecting the above information (both the active and standby nodes), additional information such as flow replication status information, recovery.conf file, and repmgr configuration file need to be collected.

4.1 Information to be queried for cluster and Stream Replication

Example Query the status of nodes

#Example Query the status of nodeshighgo=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin  | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | re play_lag | sync_priority | sync_state -------+----------+---------+------------------+---------------+-----------------+-------------+------------------------ -------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+--- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- 24544 | 443 | sysdba | walreceiver | 192.168.80.44 | | 45060 | 2020-04-20 09:27:12. 934557 + 08 | | streaming | 0/4 e000140 | 0/4 e000140 | 0/4 e000140 | 0/4 e000140 | | | | | 0 async (1 row)
#Example Query the delay of the active and standby nodes
highgo=# SELECT pg_wal_lsn_diff (pg_current_wal_insert_lsn(), replay_lsn ) AS lag_bytes, application_name FROM pg_stat_replication;
 lag_bytes | application_name 
-----------+------------------
         0 | walreceiver
(1 row)

Copy the code

The recovery.conf file exists in the data directory of the standby node in cluster and flow replication. Copy the file down and send it to Hangao engineers.

4.2 Additional Information required by the Cluster

The conf file hg_repMgr. conf exists in the conf directory of the active and standby nodes of the cluster database. Copy the file and send it to the engineer.

The following files:

[root@registry conf]# PWD /opt/HighGo4.3.4.7-see/conf [root@registry conf]# ls -l hg_repMgr. conf -rw-r--r-- 1 root root 15233 Apr 13 15:24 hg_repmgr.confCopy the code

5. Performance related

This section describes the information you need to obtain when a performance problem occurs in the Hango database. The following sections, which involve operating system commands, are the most commonly used commands.

5.1 CPU Load

Top command can retrieve the current system of each system resource usage, you can use the following command to get the top results, the following command to the database performance problems during execution, the execution time for 5 minutes, do not shut down during the execution of, don’t perform other operations, the execution is completed, can produce top in the current directory. TXT this file, Please copy the file and send it to Engineer Henco.

[root@registry conf]# top -d 3 -n 100 -b -i -c -H > top.txt

Copy the code

5.2 Memory Usage

Run the free command to obtain the memory usage of the operating system. Run the following command to obtain the memory usage, save it to a file, and send it to the engineer.

[root@registry conf]# free -m
              total        used        free      shared  buff/cache   available
Mem:           3789         327        1439         185        2022        2987
Swap:             0           0           0
Copy the code

5.3 Disk Read/Write Status

Iostat can obtain the read and write status of the disk. Run the following command to obtain the read and write status of the disk. The following command will obtain the read and write status of the disk within one minute.

[root@registry conf]# iostat -d -x -k 1 60 >diskio.txt

Copy the code

5.4 Obtaining an SQL Execution Plan

If the execution speed of an SQL statement is slow during the system running, you can run the explain command to obtain the execution plan of the SQL statement. The execution plan obtained by the explain command is not the actual execution plan. This method is applicable to the situation where the execution time of the SQL statement is long and the execution plan needs to be obtained quickly. To obtain the real execution plan, use the analyze option of the explain command. The following are examples of the two execution plans:

#Gets the execution plan in the plannerhighgo=# explain select * from hgtest limit 1; The QUERY PLAN -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Limit (cost = 0.00.. 0.02 rows = 1 width = 8)   ->Seq Scan on hgTest (cost=0.00.. 154053.60 rows = 9999860 width = 8)
(2 rows)

#Get the actual execution planhighgo=# explain analyze select * from hgtest limit 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Limit (cost = 0.00.. 0.02 Rows =1 width=8) (actual time=0.022.. 0.024 rows = 1 loops = 1)   ->Seq Scan on hgTest (cost=0.00.. 154053.60 rows=9999860 width=8) (actual time=0.021.. 0.021 rows = 1 loops = 1)Planning time: 0.080ms Execution time: 0.068ms (4 rows)Copy the code

5.5 Obtaining Slow SQL Files in Batches

By setting database parameters, you can batch obtain SQL statements that take longer than a certain period of time. The setting method is as follows:

#Modify the postgresql.conf configuration file and set the following parameters
#Enabling logging
logging_collector = on
#Set the log output format. The formats are stderr (default), csvlog, and syslogLog_destination = 'csvlog'#Set the log storage location. The following Settings indicate that logs are stored$PGDATAIn the hgdb_log log fileLog_directory = 'hgdb_log'#Setting log truncation
log_truncate_on_rotation = on
#Set the name of the logLog_filename = '% highgodb_ d.l og'#Sets the level of SQL statements to be traced. Levels include None (default, only error messages are recorded), DDL, mod, and all
log_statement = all
#Record SQL statements executed over the following time, in milliseconds, and adjust the time as needed
log_min_duration_statement = 5000
Copy the code

After the preceding parameters are set, the database needs to be restarted to take effect. The logs are stored in the hgdb_log directory under the data directory.