Use Strace to track interactions between user processes and the Linux kernel.

Hand putting a Linux file folder into a drawer

The system call is a programming way for a program to request services from the kernel, and Strace is a powerful tool that lets you track the interactions between user processes and the Linux kernel.

To understand how an operating system works, you first need to understand how system calls work. One of the main functions of an operating system is to provide an abstraction mechanism for user programs.

Operating systems can be broadly divided into two modes:

  • Kernel mode: A powerful privileged mode used by operating system kernels
  • User mode: Where most user applications run users mostly use command-line utilities and graphical user interfaces (GUIs) to perform everyday tasks. The system calls run silently in the background, interacting with the kernel to get the job done.

System calls are very similar to function calls, which means they both accept and process parameters and then return values. The only difference is that system calls go into the kernel, while function calls don’t. Switching from user space to kernel space is done using a special trap mechanism.

Most system calls are hidden from the user by using the system library, also known as glibc on Linux systems. Although system calls are generic in nature, the mechanism by which they are made depends largely on the machine (architecture).

This article explores some practical examples by using some general commands and using Strace to analyze the system calls made by each command. These examples use Red Hat Enterprise Linux, but the commands should run the same on other Linux distributions:

[root@sandbox ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
[root@sandbox ~]#
[root@sandbox ~]# uname -r3.10.0-1062. El7. X86_64 / root @ the sandbox ~#
Copy the code

First, make sure you have the necessary tools installed on your system. You can verify that Strace is installed using the RPM command below. If installed, you can check the version number of the Strace utility using the -v option:

[root@sandbox ~]# rpm -qa | grep -i straceThe strace - 4.12-9. El7. X86_64 [root @ the sandbox ~]#
[root@sandbox ~]# strace -VStrace -- version 4.12 [root@sandbox ~]#
Copy the code

If not, run the following command to install:

yum install strace
Copy the code

For the purposes of this example, create a test directory in/TMP and use the touch command to create two files:

[root@sandbox ~]# cd /tmp/
[root@sandbox tmp]#
[root@sandbox tmp]# mkdir testdir
[root@sandbox tmp]#
[root@sandbox tmp]# touch testdir/file1
[root@sandbox tmp]# touch testdir/file2
[root@sandbox tmp]#
Copy the code

(I use the/TMP directory because it is accessible to everyone, but you can choose another directory as needed.)

Verify that the file has been created using the ls command in the testdir directory:

[root@sandbox tmp]# ls testdir/
file1  file2
[root@sandbox tmp]#
Copy the code

You may be using the ls command every day without realizing the role that system calls play underneath it. Abstractly, this command works like this:

Command line tools -> Call a function from the system library (glibc) -> Call a system call

The ls command internally calls functions from a system library on Linux (that is, glibc). These libraries call the system calls that do most of the work.

If you want to know which functions are called from the glibc library, use the ltrace command, followed by the regular ls testdir/ command:

ltrace ls testdir/
Copy the code

If ltrace is not installed, type the following command to install it:

yum install ltrace
Copy the code

A lot of output is heaped onto the screen; Don’t worry. Just keep going. Some of the important library functions in the ltrace command output that are relevant to this example include:

opendir("testdir/")                                  = { 3 }
readdir({ 3 })                                       = { 101879119, "." }
readdir({ 3 })                                       = { 134, ".." }
readdir({ 3 })                                       = { 101879120, "file1" }
strlen("file1")                                      = 5
memcpy(0x1665be0, "file1\0", 6)                      = 0x1665be0
readdir({ 3 })                                       = { 101879122, "file2" }
strlen("file2")                                      = 5
memcpy(0x166dcb0, "file2\0", 6)                      = 0x166dcb0
readdir({ 3 })                                       = nil
closedir({ 3 })                                         
Copy the code

You can probably get a sense of what’s going on by looking at the output above. The Opendir library function opens a directory named testdir and then calls the readdir function, which reads the contents of the directory. Finally, there is a call to the Closedir function, which closes the previously open directory. Ignore the other Strlen and memcpy features for now.

You can see which library functions are being called, but this article will focus on system calls made by system library functions.

Similarly, to understand which system calls were called, simply place strace before the ls testdir command, as shown below. Once again, a bunch of garbled characters are thrown onto your screen. You can do this by following these steps:

[root@sandbox tmp]# strace ls testdir/
execve("/usr/bin/ls"["ls"."testdir/"], [/* 40 vars */]) = 0
brk(NULL)                               = 0x1f12000
<<< truncated strace output >>>
write(1, "file1 file2\n", 13file1  file2
)          = 13
close(1)                                = 0
munmap(0x7fd002c8d000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
[root@sandbox tmp]#
Copy the code

The output on the screen after running the strace command is the system call running the ls command. Each system call provides a specific purpose for the operating system, which can be roughly divided into the following sections:

  • Process management system calls
  • File management system call
  • Directory and file system management system calls
  • Other system calls

An easier way to analyze the information displayed on the screen is to record the output to a file using Strace’s handy -O flag. Add an appropriate filename after the -o flag, then run the command again:

[root@sandbox tmp]# strace -o trace.log ls testdir/
file1  file2
[root@sandbox tmp]#
Copy the code

This time, no output interferes with the screen, and the ls command works as expected, displaying the file name and logging all output to the file trace.log. With a simple ls command, the file has nearly 100 lines:

[root@sandbox tmp]# ls -l trace.log
-rw-r--r--. 1 root root 7809 Oct 12 13:52 trace.log
[root@sandbox tmp]#
[root@sandbox tmp]# wc -l trace.log
114 trace.log
[root@sandbox tmp]#
Copy the code

Let’s look at the first line of this example trace.log file:

execve("/usr/bin/ls"["ls"."testdir/"], [/* 40 vars */]) = 0
Copy the code
  • The first word in the lineexecveIs the name of the system call being executed.
  • The text in parentheses is the parameter supplied to the system call.
  • symbol=After the figure (in this case is0) isexecveThe return value of the system call.

Now the output doesn’t seem too scary, right? You can apply the same logic to other lines.

For now, focus on the single command you call, ls testdir. You know the directory name used by the ls command, so why not use grep in the trace.log file to find testdir and see the result? Let’s look at each line of the result in detail:

[root@sandbox tmp]# grep testdir trace.log
execve("/usr/bin/ls"["ls"."testdir/"], [/* 40 vars */]) = 0
stat("testdir/", {st_mode=S_IFDIR|0755, st_size=32, ... }) = 0 openat(AT_FDCWD,"testdir/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
[root@sandbox tmp]#
Copy the code

Reviewing the analysis of execve above, can you tell me what this system call does?

execve("/usr/bin/ls"["ls"."testdir/"], [/* 40 vars */]) = 0
Copy the code

You don’t have to remember all the system calls or what they do, because you can refer to the documentation when you need it. Man pages can save you! Before running the man command, ensure that the following packages are installed:

[root@sandbox tmp]# rpm -qa | grep -i man-pagesMan - pages - 3.53-5. El7. Noarch [root @ the sandbox TMP]#
Copy the code

Remember, you need to add a 2 between the man command and the system call name. If you read the man page of the man command using man man, you’ll see that section 2 is reserved for system calls. Similarly, if you need information about library functions, add a 3 between man and the library function name.

Here are the chapter numbers of the manual and the types of pages it contains:

  • 1: Executable program or shell command
  • 2: system calls (functions provided by the kernel)
  • 3: library call (function inside a program’s library)
  • 4: special files (usually present in/dev)

Run the following man command with the system call name to view the documentation for the system call:

man 2 execve
Copy the code

According to the Execve man page, this will execute the program passed in the parameters (ls in this case). You can provide additional parameters to ls, such as testdir in this example. Therefore, this system call only runs ls with testdir as an argument:

execve - execute program

DESCRIPTION
       execve()  executes  the  program  pointed to by filename
Copy the code

The next system call, named stat, takes the testdir argument:

stat("testdir/", {st_mode=S_IFDIR|0755, st_size=32, ... }) = 0Copy the code

Use man 2 stat to access the document. Stat is the system call to get the status of a file, and remember that everything in Linux is a file, including a directory.

Next, the OpenAT system call opens Testdir. Pay close attention to the returned 3. This is a file descriptor that will be used in future system calls:

openat(AT_FDCWD, "testdir/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
Copy the code

So far so good. Now, open the trace.log file and go to the line after the Openat system call. You’ll see the GetDents system call called, which does most of the work required to execute the ls testdir command. Now, get getDents with grep from the trace.log file:

[root@sandbox tmp]# grep getdents trace.log
getdents(3, /* 4 entries */, 32768)     = 112
getdents(3, /* 0 entries */, 32768)     = 0
[root@sandbox tmp]#
Copy the code

Getdents’ man page describes it as “getting catalog entries,” and that’s what you do. Note that the GetDents argument is 3, which is the file descriptor from the OpenAT system call above.

Now that you have the directory list, you need a way to display it on the terminal. So grep is used in the log to search for another system call to write to the terminal, write:

[root@sandbox tmp]# grep write trace.log
write(1, "file1 file2\n", 13)          = 13
[root@sandbox tmp]#
Copy the code

In these parameters, you can see the file names that will be displayed: file1 and file2. For the first parameter (1), remember that in Linux, when any process is running, three file descriptors are opened for it by default. Here is the default file descriptor:

  • 0: Standard input
  • 1: Standard output
  • 2: Standard error

Therefore, the write system call will display file1 and file2 on the standard display (this terminal, identified by 1).

Now you know which system call does most of the work for the ls testdir/ command. But what about the other 100 + system calls in the trace.log file? The operating system has to do a lot of housekeeping to run a process, so a lot of what you see in this log file is process initialization and cleanup. Read the entire trace.log file and try to understand how the ls command works.

Now that you know how to analyze system calls for a given command, you can apply that knowledge to other commands to understand which system calls are being executed. Strace provides a number of useful command line flags to make it easier for you to use, some of which are described below.

By default, Strace does not contain all system call information. However, it has a convenient -v redundancy option that provides additional information in each system call:

strace -v ls testdir
Copy the code

It is good practice to always use the -f option when running the strace command. It allows Strace to trace any child processes created by the process currently being traced:

strace -f ls testdir
Copy the code

Suppose you just need the name of the system call, the number of times it was run, and the percentage of time each system call took. You can use the -c flag to get these statistics:

strace -c ls testdir/
Copy the code

Suppose you want to focus on a particular system call, such as the Open system call, and ignore the rest. You can follow the name of the system call with the -e flag:

[root@sandbox tmp]# strace -e open ls testdir
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpcre.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
file1  file2
+++ exited with 0 +++
[root@sandbox tmp]#
Copy the code

What if you want to focus on multiple system calls? Don’t worry, you can also use the -e command line flag and separate the names of the two system calls with commas. For example, to see the Write and GetDents system calls:

[root@sandbox tmp]# strace -e write,getdents ls testdir
getdents(3, /* 4 entries */, 32768)     = 112
getdents(3, /* 0 entries */, 32768)     = 0
write(1, "file1 file2\n", 13file1  file2
)          = 13
+++ exited with 0 +++
[root@sandbox tmp]#
Copy the code

So far, these examples are explicitly running the commands that have been traced. But what about keeping track of commands that have been run and are being executed? For example, what if you want to track a daemon used to run a process for a long time? To do this, Strace provides a special -p flag to which you can supply a process ID.

Instead of running strace on a daemon, our example uses the cat command, which normally displays the contents of the file if you take the file name as an argument. If no argument is given, the cat command waits on the terminal for the user to enter text. After the text is entered, it repeats the given text until the user presses Ctrl + C to exit.

Run the cat command from a terminal. It will show you a prompt and wait there (remember cat is still running and hasn’t quit yet) :

[root@sandbox tmp]# cat
Copy the code

On the other terminal, use the ps command to find the process identifier (PID) :

[root@sandbox ~]# ps -ef | grep cat
root      22443  20164  0 14:19 pts/0    00:00:00 cat
root      22482  20300  0 14:20 pts/1    00:00:00 grep --color=auto cat
[root@sandbox ~]#
Copy the code

Now run Strace on the running process with the -p flag and PID (found above using PS). After strace is run, the output describes the contents of the connected process and its PID. Now, Strace is tracking system calls made by the cat command. The first system call you see is read, which is waiting for input from the file descriptor 0 (standard input, which is the terminal from which the cat command is run) :

[root@sandbox ~]# strace -p 22443
strace: Process 22443 attached
read(0,
Copy the code

Now, go back to the terminal where you ran cat and type some text. I entered x0x0 for demonstration purposes. Notice how cat simply repeats what I typed. So, x0x0 occurs twice. I typed the first one, and the second is the repeated output of the cat command:

[root@sandbox tmp]# cat
x0x0
x0x0
Copy the code

Return to the terminal connecting strace to the CAT process. You will now see two additional system calls: the earlier READ system call, which now reads X0x0 from the terminal, and the second, write, which writes X0x0 back to the terminal, and then a new read, which is waiting to be read from the terminal. Note that standard input (0) and standard output (1) are both in the same terminal:

[root@sandbox ~]# strace -p 22443
strace: Process 22443 attached
read(0, "x0x0\n", 65536)                = 5
write(1, "x0x0\n"= 5, 5)read(0,
Copy the code

Imagine how helpful this would be if you ran Strace against the daemon to see everything it was doing in the background. Kill cat by pressing Ctrl + C; This will also terminate your Strace session because the process is no longer running.

To see the timestamps of all system calls, simply use the -t option with strace:

[root@sandbox ~]#strace -t ls testdir/

14:24:47 execve("/usr/bin/ls"["ls"."testdir/"], [/* 40 vars */]) = 0
14:24:47 brk(NULL)                      = 0x1f07000
14:24:47 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2530bc8000
14:24:47 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
14:24:47 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
Copy the code

What if you want to know how long it takes between system calls? Strace has a handy -r command that shows the time it takes to execute each system call. Very useful, isn’t it?

[root@sandbox ~]#strace -r ls testdir/0.000000 execve ("/usr/bin/ls"["ls"."testdir/"40 vars], [/ * * /]) = 0, 0.000368 BRK (NULL) = 0 x1966000 0.000073 mmap (NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 1, 0) = 0 x7fb6b1155000 access (0.000047"/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
0.000119 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
Copy the code

conclusion

The Strace utility is very helpful in understanding system calls on Linux. For its other command-line flags, refer to the man pages and online documentation.


Via: opensource.com/article/19/…

By Gaurav Kamathe, lujun9972

This article is originally compiled by LCTT and released in Linux China