This is the 8th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

A lifelong learner, practitioner, and sharer committed to the path of technology, an original blogger who is busy and sometimes lazy, and a teenager who is occasionally boring and sometimes humorous.

Welcome to search “Jge’s IT Journey” on wechat!

Analyze and troubleshoot system faults in Linux environment

1. Analyze log files

Log files are used to record various running messages in the Linux system. Different log files record different types of information, such as Linux kernel messages, user login time, and program errors.

Log files are a great help in diagnosing and resolving problems in your system, because programs running on Linux systems often write system messages and error messages to the corresponding log files.

In Linux, log data consists of three types:

  • Kernel and system logs: This log data is centrally managed by the system service rsyslog. According to the Settings in the main configuration file /etc/rsyslog.conf, the kernel messages and various system program messages are recorded to the location.

  • User logs: Records information about Linux system users’ login and logout, including user names, terminals, login time, source hosts, and processes in use.

  • Program log: Some applications may choose to manage a log file independently (rather than submit it to the RSyslog service) to record various event messages during the operation of the program.

The Linux system itself and most server programs have log files in the /var/log/ directory by default. Some programs share a log file, some use a single log file, and some large server programs have more than one log file. Therefore, subdirectories will be created in /var/log/ to store log files.

1.1 Common Log Files

  • /var/log/messages: Records Linux kernel messages and common log information of various applications, including startup, 1/0 error, network error, and program failure.

  • /var/log/cron: records events generated by scheduled Crond tasks.

  • /var/log/dmesg: records events during the Linux system boot process.

  • /var/log/maillog: Records email activity entering or sending out of the system.

  • /var/log/lastlog: records the latest login events of each user.

  • /var/log/secure: Records security events related to user authentication.

  • /var/log/wtmp: Records the login, logout, and system startup and shutdown events of each user.

  • /var/log/btmp: Records failed and incorrect login attempts and verification events.

By analyzing log files, you can search for key information, debug system services, and determine the cause of faults. For most log files in text format, you can use the tail, more, less, and CAT text processing tools to view the log content.

The kernel and system log functions are provided by the rsyslog-5.8.10-8.el6.x86_64 software package installed by default. The configuration file used by the rsyslog service is /etc/rsyslog.conf.

[root@localhost ~]# grep -v "^$" /etc/rsyslog.conf # rsyslog v5 configuration file # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html # If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html #### MODULES #### $ModLoad imuxsock # provides support for local system  logging (e.g. via logger command) $ModLoad imklog # provides kernel logging support (previously done by rklogd) #$ModLoad immark # provides --MARK-- message capability # Provides UDP syslog reception #$ModLoad imudp #$UDPServerRun 514 # Provides TCP syslog reception #$ModLoad imtcp #$InputTCPServerRun 514 #### GLOBAL DIRECTIVES #### # Use default timestamp format $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat # File syncing capability is disabled by default. This feature is usually not required, # not useful and an extreme performance hit #$ActionFileEnableSync on # Include all config files in /etc/rsyslog.d/ $IncludeConfig /etc/rsyslog.d/*.conf #### RULES #### # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info; mail.none; authpriv.none; cron.none /var/log/messages # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Everybody gets emergency messages *.emerg * # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save  boot messages also to boot.log local7.* /var/log/boot.log # ### begin forwarding rule ### # The statement between the begin ... end define a SINGLE forwarding # rule. They belong together, do NOT split them. If you create multiple # forwarding rules, duplicate the whole block! # Remote Logging (we use TCP for reliable delivery) # # An on-disk queue is created for this action. If the remote host is # down, messages are spooled to disk and sent when it is up again. #$WorkDirectory /var/lib/rsyslog # where to place spool files  #$ActionQueueFileName fwdRule1 # unique name prefix for spool files #$ActionQueueMaxDiskSpace 1g # 1gb space limit (use  as much as possible) #$ActionQueueSaveOnShutdown on # save messages to disk on shutdown #$ActionQueueType LinkedList # run asynchronously #$ActionResumeRetryCount -1 # infinite retries if host is down # remote host is: The name/IP: port, i.e 192.168.0.1:514, port optional #*.* @@remote-host:514 # ### end of the forwarding rule ### # A template to for higher precision timestamps + severity logging $template SpiceTmpl,"%TIMESTAMP%.%TIMESTAMP:::date-subseconds% %syslogtag% %syslogseverity-text%:%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n" :programname, startswith, "spice-vdagent" /var/log/spice-vdagent.log; SpiceTmplCopy the code

As can be seen from the /etc/rsyslog.conf configuration file, the log files managed by the rsyslogd service are the main log files in the Linux system, recording the kernel, user authentication, mail, and scheduled task messages in the Linux system.

In the Linux kernel, log messages are prioritized according to their importance (smaller number, higher priority, more important message)

  • 0 EMERG(Emergency) : A situation that causes the host system to become unavailable.

  • 1 ALERT: A problem that must be addressed immediately.

  • CRIT: a serious error.

  • 3 ERR: An error occurs during the operation.

  • 4 WARNING: indicates an important event that may affect system functions and needs to be notified to users.

  • 5 NOTICE: Indicates events that do not affect normal functions but require attention.

  • 6 INFO: Indicates the general information.

  • 7 DEBUG: program or system debugging information.

Kernel and most system messages are logged to a common log file, /var/log/messages, while other program messages are logged to separate log files, and log messages can be logged to specific storage devices or sent directly to specific users.

Most log files that are centrally managed by the RSyslog service use the same log format. Log format in the public log /var/log/messages file, where each line represents a log message, each containing the following four fields.

  • Time label: The date and time the message was sent.

  • Hostname: The name of the computer that generated the message.

  • Subsystem name: The name of the application that sent the message.

  • Message: Indicates the specific content of the message.

2. User logs

2.1 Viewing the Current Login Users using the users, who, and w commands

The users command: prints the names of the currently logged in users. Each displayed user name corresponds to a login session. If a user has more than one login session, his username will be displayed the same number of times.

[root@localhost ~]# users
root root root
Copy the code

Using the who command, you can report the information about each user currently logged in to the system. In this way, you can view the unauthorized users in the system and audit them. The default output of who includes the user name, terminal type, login date, and remote host.

[root@localhost ~]# who root tty1 2016-06-17 17:14 (:0) root PTS /0 2016-06-17 17:16 (:0.0) root PTS /1 2016-06-17 17:18 (172.20.10.5)Copy the code

2.2 w Command User Displays information about each user and the processes running in the current system.

[root@localhost ~]# w 17:50:35 up 37 min, 3 users, load average: 0.00, 0.00, 0.00 USER TTY FROM login@idle JCPU PCPU WHAT root tty1:0 17:14 37:45 2.32s 2.32s /usr/bin/xorg: 0-nR-verbose -audit 4 -auth /var/run/gdm-auth-for-GDm-0vxga4 /datab root PTS / 0:0.0 17:16 34:07 0.00s 0.00s /bin/bash root PTS /1 172.20.10.5 163.00 s 0.13s 0.09s w [root@localhost ~]# 163.00 s 0.13s 0.09s w [root@localhost ~]#Copy the code

2.3 Querying the Login history of a User using the last and lastb commands

The last command is used to query the records of the users who have successfully logged in to the system. The latest login information is displayed first. You can run the last command to obtain the login information of the Linux host in a timely manner. If an unauthorized user has logged in, the host may be compromised.

[root@localhost ~]# Last root PTS /1 172.20.10.5 Fri Jun 17 17:18 still logged in root PTS /0 :0.0 Fri Jun 17 17:16 still X Fri Jun 17 17:12-17:51 Logged in root TTy1:0 Fri Jun 17 17:14 Still logged in reboot System boot 2.6.32-431.el6.x Fri Jun 17 17:12-17:51 (00:38) root PTS /2 172.20.10.5 Fri Jun 17 14:36-down (02:35) root PTS /1 172.20.10.5 Fri Jun 17 12:23-14:43 (00:20) Root PTS / 0:0.0 Fri Jun 17 12:22-down (04:49) root tty1:0 Fri Jun 17 12:21-down (04:50) reboot system boot 2.6.32-431.el6.x Fri Jun 17 12:19-17:12 (04:52) root PTS /1 192.168.0.133 Sat Jun 4 14:35-crash (12+21:43) root PTS /0 :0.0 Sat Jun 4 14:35 - crash (12+21:44) root tty1:0 Sat Jun 4 14:35 - crash (12+21:44) reboot system boot 2.6.32-431.el6.x Sat Jun 4 14:31-17:12 (13+02:40) root PTS /1 192.168.0.133 Sat Jun 4 12:41-down (01:49) root PTS /0 :0.0 Sat Jun 4 12:40-down (01:50) root tty1:0 Sat Jun 4 12:40-down (01:51) reboot system boot 2.6.32-431.el6.x Sat Jun 4 12:35-14:31 (01:55) root PTS /1 192.168.0.133 Sat Jun 4 12:11-down (00:23) root PTS /2 172.20.10.5 Sat Jun 4 09:32-12:30 (02:57) root PTS /1 172.20.10.5 Sat Jun 4 07:48-09:57 (02:08) root PTS /0 :0.0 Sat Jun 4 07:48-down (04:46) root tty1:0 Sat Jun 4 06:45-down (05:49) reboot system boot 2.6.32-431.el6.x Sat Jun 4 06:40-12:34 (05:54) wtmp begins Sat Jun 4 06:40:32 2016Copy the code

The lastb command is used to query the login failure records, for example, the incorrect user name or password.

[root@localhost ~]# lastb
root     tty1         :0               Fri Jun 17 12:21 - 12:21  (00:00)    
root     tty1         :0               Sat Jun  4 14:34 - 14:34  (00:00)    

btmp begins Sat Jun  4 14:34:54 2016
Copy the code

3. Program log

On Linux systems, some applications do not use the RSyslog service to manage logs. Instead, the program maintains the logging itself.

The following phenomenon of attention appears

  • The user logs in to the system at an unusual time, or the USER login IP address is different from the previous one.

  • Log records of user login failures, especially those that fail to log in repeatedly.

  • Unauthorized or improper use of superuser permissions.

  • Records of unauthorized or illegal restarts of various network services.

  • Abnormal logging, incomplete logging, or a log file like WTMP that also lacks an intermediate logging file.

Four, the elimination of system startup faults — MBR sector fault

4.1 Backing up MBR Sector Data

Since the MBR sector contains the partition table records of the entire hard disk, the backup files of this sector must be stored on other storage devices; otherwise, the backup files cannot be read during the recovery.

[root@localhost ~]# mkdir /backup [root@localhost ~]# mount /dev/sda3 /backup/ [root@localhost ~]# dd if=/dev/sda Of =/backup/sda.mbr.bak bs= 12 count=1Copy the code

4.2 Simulating MBR sector faults

Using the dd command, you can manually overwrite the MBR sector records to simulate the failure situation where the MBR sector is damaged (remember to back up the MBR sector first and save the backup file to another hard disk).

[root@localhost ~]# dd if=/dev/zero of=/dev/sda bs=512 count=1 363 kB/SECCopy the code

After the preceding operations are complete, the system restarts. “Operating System not found” is displayed, indicating that the Operating system cannot be found and the host cannot be started.

4.3 Restoring MBR Sector Data from the Backup File

When the installation wizard screen is displayed, select Rescue Installed System to boot the Linux operating system in Rescue mode.

Then, press Enter to accept the default language and keyboard format. When the prompt is displayed, select NO. Then, the system automatically finds the Linux partition in the hard disk and tries to mount it to the/MNT /sysimage directory (select Continue to confirm and Continue). The Rescue window is displayed. Click OK.

After you click Skip, you will enter the bash Shell environment at the bash-4.1# prompt. As long as you run the corresponding commands to mount the hard disk partition where backup files are stored and restore data to the hard disk /dev/sda.

After the restoration is complete, run the exit command to exit the temporary shell environment, and run the reboot command. The system automatically restarts.

Forget the password of user root

If you forget the password of the root user, you cannot log in to the Linux system to perform management and maintenance tasks. Instead, you can log in to the Linux system as other users to use limited functions. If there are other root users or users who have the permission to change the password of the root user, you can use these users to log in to the system and reset the password of the root user.

5.1 Resetting the Password of the root User in Single-user Mode

Specific steps:

1. Restart the host. When the GRUB menu is displayed, press the ↑ and ↓ arrow keys to cancel the countdown, locate the selected OPERATING system, and press E to enter the editing mode.

2. Locate the row at the beginning of the kernel and press E to add the boot parameter “single” to the end of the row. “Single” can also be changed to the letter “S” or the number “1”.

3. Press Enter and then press B to boot the system into the single-user mode and Enter the shell environment directly (no password verification is required).

4. In a single-user shell environment, run the passwd root command to reset the password of user root.

5.2 Reset the password of user root in emergency Mode

If you use the RHEL6 installation CD-ROM to enter the shell environment in emergency mode, you only need to switch to the root directory of the Linux to be repaired and run the passwd root command to reset the password of user root. Or modify the /etc/shadow file, delete the password field of user root, and log in to the system with an empty password after the system restarts.

Recommended reading

99% of Linux operation and maintenance engineers must master the command and use

Common commands of the Oracle database in Linux

Common commands of the vi/vim editor in Linux

Install and manage programs in Linux (basic process of package encapsulation, RPM command, source code compilation and installation)

Manage accounts and rights in Linux

Linux disk and file system management

Process and scheduled task management in Linux

In this paper, to the end.


Original is not easy, if you think this article is a little useful to you, please give me a like, comment or forward for this article, because this will be my power to output more quality articles, thanks!

By the way, dig friends remember to give me a free attention yo! In case you get lost and you can’t find me next time.

See you next time!