preface
We know the Linux three swordsmen: grep, sed, awk.
Sed can implement non-interactive string substitution, grep can implement effective filtering function. In contrast, AWK is a powerful text analysis tool, especially for analyzing data and generating reports.
Awk has more powerful functions than common Linux commands. In this article, I won’t scare you by telling you that AWK is also a programming language. Just think of it as the next powerful text analysis tool for Linux.
Before we use it, let’s take a look at what AWK does:
-
Can be given text content, in accordance with our expected format output display, print into a report.
-
Analyze and process system logs, quickly analyze and mine the data we care about, and generate statistical information;
-
Conveniently used for statistical data, such as website visits, the amount of IP access, etc.;
-
Through the combination of various tools, quickly summarize and analyze the operation of the system information, so that you know the operation of the system like the palm of your hand;
-
Powerful scripting language expression ability, support loop, conditional, array syntax, help you analyze more complex data;
…
Awk does more than that, of course. When you integrate its usage, you can do as much efficient data analysis and statistics as you want.
However, it is important to note that AWK is not a panacea. It is better at handling formatted text, such as logs and CSV data.
Let’s take a look at how AWK works. We hope you can understand how awK works.
Awk basic command format
The following figure illustrates how AWK works in detail
First, execute the command in {} identified by the BEGIN keyword. After completing the command in the BEGIN braces, execute the body command; Read data line by line, default read \n split content as a record, in fact, is the concept of line; Partitioning records into fields with specified delimiters is the concept of a column; Loop through the commands in the body block, executing the body once for each line read, and finally completing the body execution; Finally, the END command is executed, usually printing the final result in END; Awk is input-driven and executes the body command as many times as there are input lines.
In the examples below, keep in mind that a Record is a row, a Field is a column, BEGIN is the pre-processing stage, body is the stage where AWK actually works, and END is the final processing stage.
Actual Combat – Introduction
From the following content, let’s go straight to the actual combat. For example, I’ll save the following information to file.txt
Ok, let’s start with the simplest and most commonly used EXAMPLE of AWK, printing columns 1, 4, and 8:
The awK statement inside the braces can only be enclosed in single quotes, where, 1.. 1.. 1.. N represents the number of columns, and $0 represents the entire row
Take a look at awK’s more useful function of formatting output. Like C’s printf format, I particularly like this format over the C++ stream format.
%s represents a string placeholder, -4 represents a column width of 4 and left-aligned, and we can list more complex formats as needed without going into detail here.
Actual combat – Advanced
(a) Filtering records some data may not be what you want, you can filter as needed
The filter condition above is that rows 3 as root and 6 as 10 are printed.
Awk supports all kinds of comparison operation symbols! =, >, <, >=, <=, where $0 represents the contents of the entire line.
(2) Built-in variables AwK has some built-in variables to facilitate data processing
Filter column 3 for root, as well as line 2, and print the line number. NR indicates the current row and NF indicates the current row.
Our data is not always delimited by Spaces. We can specify the delimiter using the FS variable.
We specify a separator of 2019, which splits the line content into two parts, replacing 2019 with *
The above command can also specify the separator with the -f option
If you need to specify multiple delimiters, you can do so -f ‘[;:]’. I believe that smart you, will be able to understand and through.
Also, AWK can specify a delimiter for output, which is set by the OFS variable
When output, the fields are separated by symbols specified by OFS.
Actual combat – Advanced
(1) Condition matches all files of the root user as well as the first line of the file
The row that contains root in the third column above matches the ~ as a regular expression match.
Again, awk can match a line like grep, like this
In addition, it can be/Aug | Dec/match multiple keywords.
Mode take reverse can be used! symbol
Let’s do an interesting thing. We can split the text information into multiple files. The following command splits the file information into multiple files by month (column 5)
Awk supports the redirection symbol >, which directly redirects each row to the file named by the month. Of course, you can also export the specified column to the file
(three) if statement complex condition judgment, you can use AWK if statement, AWK is powerful because it is a script interpreter, with general scripting language programming ability, the following example through a slightly complex condition to split files
Note that the if statement is enclosed in braces.
C and *. H files in the current directory
Column 5 represents the file size, which is computed into the sum variable for each line read. In the END phase, sum is printed, which is the total size of all files.
For another example, count how much memory is used by each user’s process. Note that the value is in the RSS column
Arrays and for loops are used here. It is worth noting that awK arrays can be understood as dictionaries or maps, and keys can be numbers or strings.
The following simple example demonstrates awK’s support for string operations
Awk has built-in support for a series of string functions, length calculates the length of a string, and the toupper function converts a string to uppercase.
To understand how AWK works as a whole, let’s look at another comprehensive example. Suppose we have a student report card:
Because this sample program is a bit complex, it is not easy to read on the command line. In addition, we also want to introduce another way to execute AWK through this case. Our AWK script is as follows:
The result of executing awK is as follows
We can write complex AWK statements to the script file cal.awk and specify execution from the script file with the -f option.
BEGIN
body
END
Copy the code
This simple example, a complete embodiment of awK working mechanism and principle, I hope this example can help you really understand how AWK works.
summarizing
Through the above examples, we learned how AWK works, so let’s summarize a few concepts and common knowledge points.
(I) Built-in variables
-
Each line of content records, called records, English name Record
-
Each column in a row separated by a delimiter is called a Field
With these concepts in mind, let’s summarize some important built-in variables:
NR
NF
RS
FS
OFS
ORS
Copy the code
Awk provides printf function to format the output function, and the specific use of C syntax is basically the same.
Basic usage
Common formatting methods:
%d decimal signed integer % U decimal unsigned integer % F floating point number % S string % C single character % E exponential floating point number %x %X unsigned hexadecimal integer %0 unsigned integer in octal integer %g Automatically select the appropriate representation \n newline character \t The Tab characterCopy the code
Awk is more than just a Linux command-line tool. It is a scripting language that supports all the control structures of programming languages. It supports:
Awk has a large number of useful functions built in. It also supports custom functions, allowing you to write your own functions to extend the built-in functions.
Here is a quick list of some of the more commonly used string functions:
index(s, t)
length(s)
split(s, a, sep)
substr(s, p, n)
tolower(s)
toupper(s)
Copy the code
Here is a simple summary of some commonly used string functions, the specific use of the method, you also need to refer to the previous example procedures, to apply to the actual problem.