By all accounts, grep, sed, and AWk are the text processing trio of the shell. Their functions are to find, replace, and collate the output of text, respectively.

This article details awK and SED tools.

The overall design concept of AWK and SED

Serial, streaming uses line buffers to process data in large text. That is, processing textual data line by line, involving concepts such as:

  • file
  • line
  • column

Awk and sed are text processing stream tools that can print, transform, or modify an input stream. The schematic diagram is as follows:

awk

The main function of AWK is to read text and make statistical reports.

Built-in variables

Let’s start with awK’s built-in variables, which can be categorized into files, rows, and columns:

  • fileRelated variables
    • FNR: Indicates that in multi-text processing, each file line number is counted separately, starting from 1.
    • FILENAME: Indicates the name of the file currently entered.
  • lineRelated variables
    • $0: The entire line
    • NR:Number RowRepresents the line number of the current line, counting from 1.
    • RS — Row SeparatorRepresents the input line separator, default carriage return line feed\n 。
  • columnRelated variables
    • $1-$n: The first of the current line1-nA field.
    • NF:Number FieldRepresents the number of columns in the current row.
    • FS — Field SeparatorShort for input field delimiter, does not specify default to a space ortabKey division.

usage

The basic usage is: awk ‘BEGIN{… }pattern{… }END{… }’ file_name

The print/printf function is used for the display, which is more commonly used when using AWK.

Preprocessing BEGIN

It is typically used to initialize variables before processing a text stream, such as setting the column separator FS. awk ‘BEGIN{FS=”:”}’

Post-process END

After processing a stream of text, for example to print a sum

#  awk 'BEGIN{total=0}{total++}END {print total}' access.log 
495903
Copy the code

There are 495,900 lines in access.log.

How to match

Inline matching for text streams. Awk supports:

  • Generic Linux re match, text match format is/pattern/
# awk -F : '/root/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
# awk -F : '/^root/{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
Copy the code

Matches lines that have root in the line, and lines that begin with root

  • Operator matchingawk -F : '$3<10{print $0}' /etc/passwd

Only $3<10 that meets the condition will be printed to change lines.

Some of awK’s built-in functions

Some arithmetic functions, events, functions and string functions such as length, substr and index, such as the need to view usage: www.tutorialspoint.com/awk/awk_bui…

Expressions in {}

Function expressions can be written in pre, post, and stream processing blocks. Separated. Variables are passed in the three blocks of AWK from BEGIN to stream processing, and then to END, in order of execution.

If the grammar

Awk ‘BEGIN {num = 10; if (num % 2 == 0) printf “%d is even number.\n”, num }’

Loop syntax

awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'

The syntax is similar to that of C:

  • The for loop
  • The do while loop
  • Break out of

An array of

Awk has associative arrays, and indexes don’t have to be a set of consecutive numbers; You can use strings or numbers as array indexes. In addition, there is no need to declare the size of the array in advance – the array can grow/shrink at run time. Just use an array as follows:

array_name[index] = value
Copy the code

Example: AccessLog statistics report

Let’s look at an example.

The server has a log of requests for access. Some of the requests are traffic attacks. Traffic attacks have a special header “TEST” and now it prints statistics on different HTTP requests for traffic attacks. We can do this using AWK as follows:

# tail access.log"9.134.77.72 127.0.0.1" "" "" - "" [11 / Jan / 2022:21:20:16 + 0800]" "/ API/test1" "200", "2", "2914" "-" "Prometheus / 2.17.1" "TEST" "9.134.77.72" "127.0.0.1" "-", "[11 / Jan / 2022:21:20:16 + 0800]" "/ API/test2" "200", "2", "2914", "-" "Prometheus / 2.17.1" "TEST"...# grep "TEST" access.log|wc -l # count the number of attacks called 500,000 times
# # Enhanced Zgrep can find compressed files
# # Enhanced egrep, like -e, enhances regular matching

#Write awK processing scripts
# cat access.awkBEGIN{ FS="\" \""; Printf "%-10s %-10s HandledTime ResponseBytes\n", "Request", "Total"} /TEST/ # match TEST {Request [$5]+=1; REQUEST1[$5]+=$7; # handle time ,ms REQUEST2[$5]+=$8; # response bytes; handledTime+=$7; bytes+=$8; } END{for(u in REQUEST) # Printf "%-10s %-10d %-10d %-10d\n", u, REQUEST[u], REQUEST1[u], REQUEST2[u]} printf "handledTime: %d min\n",handledTime/1000/60; printf "handledBytes: %d MB\n",bytes/1000/1000; }
# awk -f  access.awk  access.log # to perform

# # final output
api/test1 10000 200000 2000000
api/test2 6900 30000 200000
...
handledTime: 39 min
handledBytes: 960 MB

# Once the above content is saved to a file, sort can be used to find the most frequently called request routes
# sort -k 2 -rn stat.txt |head -n 20
Copy the code

Awk summary

As you can see from the example, sed has several options that can be configured

  • -f: the delimiter followed by field
  • -f: sed uses ” to enter the script on the command line, or -f to select the script file.

Awk’s syntax is not as difficult as it might seem, and it can match interesting rows for stream processing and output statistical reports.

Sed tools

Having covered the basics and usage of AWK, we’ll move on to sed, a tool for the same flow processing.

The main use of sed is to modify text.

Sed matches each line of text and performs some action. Sed [option] ‘/pattern/command’ file

Such as:

# sed '/abc/p' test.txt / ABC/indicates pattern pattern matching. P indicates command printing
Copy the code

The main function

Sed provides many commands, including p print, D delete, and S replace.

The most common feature in SED is s substitution

Sed to match

Matching can be by line number or string \pattern\ as shown below:

  • Text match, meaning with AWk, string\pattern\-e can also be matched by an enhanced regular expression
  • Row matching, number matching, for example1p
  • Matches multiple lines. Matches blocks of lines starting with start and ending with end.1p,10pLines 1-10 can be matched.

options

There are some useful options:

  • The default is not to modify the original file, but to print only the changes to the console. Add the -i option to modify the text file directly in place
  • By default, matched and operated lines are displayed. The -n option does not display matched and operated lines. Therefore, the -n and p commands are generally used together.
  • Option -e command command processing, there can be more than one. Unlike AWK, which can use expressions, sed can only use simple commands, but it can use multiple commands.
  • Option -f command reads from file
  • The -e option can use the enhanced regular matching pattern

The main command

p

#echo hello >>test.txt
#echo "nice to test" >>test.txt
# sed 'p' test.txt Print the original line
hello 
hello
nice to test
nice to test
# sed '2p' -n test.txt
nice to test
Copy the code

2P means that the second line is matched and printed, and -n does not print the original line.

d

# sed  -i '1d' test.txt
# #d will not print the original line   
# cat test.txt                  
nice to test
# sed  '/nice/d' -i test.txt
# cat test.txt
## delete finished
Copy the code

D will not print the original line. After two operations, both lines are deleted.

s

The use of s is to put the command first, followed by the matching and replacing characters. Syntax for the s/pattern/replace /

# sed  's/test/hello/' test.txt Match only the first test of the line and replace it with Hello
hello
nice to hello
# sed  -i 's/test/hello/' test.txt 
# # -i will not output to the console
#sed  's/test/hello/g' test.txt # g means to replace all matches in this row
#sed  's/test/hello/2g' test.txt # 
# sed  's/test/hello/1g' test.txt
# sed  's/test/hello/ig' test.txt # ignore case

Copy the code
  • S /test/hello/ replaces only the first one in the line, equivalent to s/test/hello/1g
  • S /test/hello/g replaces all tests in the line with hello
  • S /test/hello/2g replaces the second matching test in the line with the last matching test as hello
  • S /test/hello/ig matches ignore case and test can also be replaced

Other things that are less common

There are many other less common functions, including A/I/R/W and possibly others, which you can check in your manual when using.

conclusion

This article mainly introduces the basic principles and common usage of awK and SED, including matching rules, main commands and basic syntax. I hope I can help you.