By all accounts, grep, sed, and AWk are the text processing trio of the shell. Their functions are to find, replace, and collate the output of text, respectively.
This article details awK and SED tools.
The overall design concept of AWK and SED
Serial, streaming uses line buffers to process data in large text. That is, processing textual data line by line, involving concepts such as:
- file
- line
- column
Awk and sed are text processing stream tools that can print, transform, or modify an input stream. The schematic diagram is as follows:
awk
The main function of AWK is to read text and make statistical reports.
Built-in variables
Let’s start with awK’s built-in variables, which can be categorized into files, rows, and columns:
- fileRelated variables
FNR
: Indicates that in multi-text processing, each file line number is counted separately, starting from 1.FILENAME
: Indicates the name of the file currently entered.
- lineRelated variables
$0
: The entire lineNR
:Number Row
Represents the line number of the current line, counting from 1.RS
—Row Separator
Represents the input line separator, default carriage return line feed\n
。
- columnRelated variables
$1-$n
: The first of the current line1-n
A field.NF
:Number Field
Represents the number of columns in the current row.FS
—Field Separator
Short for input field delimiter, does not specify default to a space ortab
Key division.
usage
The basic usage is: awk ‘BEGIN{… }pattern{… }END{… }’ file_name
The print/printf function is used for the display, which is more commonly used when using AWK.
Preprocessing BEGIN
It is typically used to initialize variables before processing a text stream, such as setting the column separator FS. awk ‘BEGIN{FS=”:”}’
Post-process END
After processing a stream of text, for example to print a sum
# awk 'BEGIN{total=0}{total++}END {print total}' access.log
495903
Copy the code
There are 495,900 lines in access.log.
How to match
Inline matching for text streams. Awk supports:
- Generic Linux re match, text match format is
/pattern/
# awk -F : '/root/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
# awk -F : '/^root/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
Copy the code
Matches lines that have root in the line, and lines that begin with root
- Operator matching
awk -F : '$3<10{print $0}' /etc/passwd
Only $3<10 that meets the condition will be printed to change lines.
Some of awK’s built-in functions
Some arithmetic functions, events, functions and string functions such as length, substr and index, such as the need to view usage: www.tutorialspoint.com/awk/awk_bui…
Expressions in {}
Function expressions can be written in pre, post, and stream processing blocks. Separated. Variables are passed in the three blocks of AWK from BEGIN to stream processing, and then to END, in order of execution.
If the grammar
Awk ‘BEGIN {num = 10; if (num % 2 == 0) printf “%d is even number.\n”, num }’
Loop syntax
awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'
The syntax is similar to that of C:
- The for loop
- The do while loop
- Break out of
An array of
Awk has associative arrays, and indexes don’t have to be a set of consecutive numbers; You can use strings or numbers as array indexes. In addition, there is no need to declare the size of the array in advance – the array can grow/shrink at run time. Just use an array as follows:
array_name[index] = value
Copy the code
Example: AccessLog statistics report
Let’s look at an example.
The server has a log of requests for access. Some of the requests are traffic attacks. Traffic attacks have a special header “TEST” and now it prints statistics on different HTTP requests for traffic attacks. We can do this using AWK as follows:
# tail access.log"9.134.77.72 127.0.0.1" "" "" - "" [11 / Jan / 2022:21:20:16 + 0800]" "/ API/test1" "200", "2", "2914" "-" "Prometheus / 2.17.1" "TEST" "9.134.77.72" "127.0.0.1" "-", "[11 / Jan / 2022:21:20:16 + 0800]" "/ API/test2" "200", "2", "2914", "-" "Prometheus / 2.17.1" "TEST"...# grep "TEST" access.log|wc -l # count the number of attacks called 500,000 times
# # Enhanced Zgrep can find compressed files
# # Enhanced egrep, like -e, enhances regular matching
#Write awK processing scripts
# cat access.awkBEGIN{ FS="\" \""; Printf "%-10s %-10s HandledTime ResponseBytes\n", "Request", "Total"} /TEST/ # match TEST {Request [$5]+=1; REQUEST1[$5]+=$7; # handle time ,ms REQUEST2[$5]+=$8; # response bytes; handledTime+=$7; bytes+=$8; } END{for(u in REQUEST) # Printf "%-10s %-10d %-10d %-10d\n", u, REQUEST[u], REQUEST1[u], REQUEST2[u]} printf "handledTime: %d min\n",handledTime/1000/60; printf "handledBytes: %d MB\n",bytes/1000/1000; }
# awk -f access.awk access.log # to perform
# # final output
api/test1 10000 200000 2000000
api/test2 6900 30000 200000
...
handledTime: 39 min
handledBytes: 960 MB
# Once the above content is saved to a file, sort can be used to find the most frequently called request routes
# sort -k 2 -rn stat.txt |head -n 20
Copy the code
Awk summary
As you can see from the example, sed has several options that can be configured
- -f: the delimiter followed by field
- -f: sed uses ” to enter the script on the command line, or -f to select the script file.
Awk’s syntax is not as difficult as it might seem, and it can match interesting rows for stream processing and output statistical reports.
Sed tools
Having covered the basics and usage of AWK, we’ll move on to sed, a tool for the same flow processing.
The main use of sed is to modify text.
Sed matches each line of text and performs some action. Sed [option] ‘/pattern/command’ file
Such as:
# sed '/abc/p' test.txt / ABC/indicates pattern pattern matching. P indicates command printing
Copy the code
The main function
Sed provides many commands, including p print, D delete, and S replace.
The most common feature in SED is s substitution
Sed to match
Matching can be by line number or string \pattern\ as shown below:
- Text match, meaning with AWk, string
\pattern\
-e can also be matched by an enhanced regular expression - Row matching, number matching, for example
1p
- Matches multiple lines. Matches blocks of lines starting with start and ending with end.
1p,10p
Lines 1-10 can be matched.
options
There are some useful options:
- The default is not to modify the original file, but to print only the changes to the console. Add the -i option to modify the text file directly in place
- By default, matched and operated lines are displayed. The -n option does not display matched and operated lines. Therefore, the -n and p commands are generally used together.
- Option -e command command processing, there can be more than one. Unlike AWK, which can use expressions, sed can only use simple commands, but it can use multiple commands.
- Option -f command reads from file
- The -e option can use the enhanced regular matching pattern
The main command
p
#echo hello >>test.txt
#echo "nice to test" >>test.txt
# sed 'p' test.txt Print the original line
hello
hello
nice to test
nice to test
# sed '2p' -n test.txt
nice to test
Copy the code
2P means that the second line is matched and printed, and -n does not print the original line.
d
# sed -i '1d' test.txt
# #d will not print the original line
# cat test.txt
nice to test
# sed '/nice/d' -i test.txt
# cat test.txt
## delete finished
Copy the code
D will not print the original line. After two operations, both lines are deleted.
s
The use of s is to put the command first, followed by the matching and replacing characters. Syntax for the s/pattern/replace /
# sed 's/test/hello/' test.txt Match only the first test of the line and replace it with Hello
hello
nice to hello
# sed -i 's/test/hello/' test.txt
# # -i will not output to the console
#sed 's/test/hello/g' test.txt # g means to replace all matches in this row
#sed 's/test/hello/2g' test.txt #
# sed 's/test/hello/1g' test.txt
# sed 's/test/hello/ig' test.txt # ignore case
Copy the code
- S /test/hello/ replaces only the first one in the line, equivalent to s/test/hello/1g
- S /test/hello/g replaces all tests in the line with hello
- S /test/hello/2g replaces the second matching test in the line with the last matching test as hello
- S /test/hello/ig matches ignore case and test can also be replaced
Other things that are less common
There are many other less common functions, including A/I/R/W and possibly others, which you can check in your manual when using.
conclusion
This article mainly introduces the basic principles and common usage of awK and SED, including matching rules, main commands and basic syntax. I hope I can help you.