The first awK introduction and expression examples

  • A language with a weird name

  • Patterns scan and process, process data and generate reports.

Awk is not just a command in Linux, but a programming language; It can be used to process data and generate reports (Excel); The data processed can be one or more files; It can be directly from standard input, or it can be piped from standard input. Awk can be operated by editing commands directly from the command line, or it can be written into AWK programs for more complex use.

Sed processes the stream editor.

Introduction to AWK environment

The AWK covered in this article is GAWK, the GNU version of AWK.

[root@creditease awk]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@creditease awk]# uname -r3.10.0-862. El7. X86_64 [root @ creditease awk]# ll `which awk`
lrwxrwxrwx. 1 root root 4 Nov  7 14:47 /usr/bin/awk -> gawk 
[root@creditease awk]# awk --versionThe GNU Awk 4.0.2Copy the code

Second, awK format

Awk commands are composed of patterns, actions, or combinations of patterns and actions.

  • The pattern is similar to the pattern matching known as SED. It can be composed of expressions or regular expressions between two forward slashes. NR==1, that’s the pattern, you can think of it as a condition.

  • An action, or action, consists of one or more statements enclosed in braces, separated by semicolons. The following awK format is used.

Records and domains

The name of the meaning
record Line records,
filed Field, region, field, column

1) NF (number of fields) indicates the number of fields (columns) in a row. $NF takes the last field.

2) the $sign indicates that a column (region), $1,NF

3) Number of Record (NR) row number. Awk has a built-in variable NR to save the record number of each row. After each processing, the value of NR will automatically +1

4) FS (-f) field separator, what is used to separate rows into multiple columns

3.1 Specifying the delimiter

[root@creditease awk]# awk -F "#" '{print $NF}' awk.txt 
GKLThe $123
GKLThe $213
GKLThe $321
[root@creditease awk]# awk -F '[#$]' '{print $NF}' awk.txt 
123
213
321
Copy the code

3.2 Conditional Actions Basic conditions and actions

[root@creditease awk]# cat awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" 'NR==1{print $1}' awk.txt
ABC
Copy the code

3.3 Conditions Only

 [root@creditease awk]# awk -F "#" 'NR==1' awk.txt
ABC#DEF#GHI#GKL$123
Copy the code

Default action {print $0}

3.4 Action Only

[root@creditease awk]# awk -F "#" '{print $1}' awk.txt
ABC
BAC
CAB
Copy the code

All rows are processed by default

3.5 Multiple modes and Actions

[root@creditease awk]# awk -F "#" 'NR==1{print $NF}NR==3{print $NF}' awk.txt 
GKLThe $123
GKLThe $321
Copy the code

3.6 Understanding of $0

$0 in awk represents the entire line

[root@creditease awk]# awk '{print $0}' awk_space.txt
ABC DEF GHI GKLThe $123
BAC DEF GHI GKLThe $213
CBA DEF GHI GKLThe $321
Copy the code

3.7 FNR

FNR is similar to NR, except that multi-file records are not incremented, and each file starts at 1.

[root@creditease awk]# awk '{print NR}' awk.txt awk_space.txt 
1
2
3
4
5
6
[root@creditease awk]# awk '{print FNR}' awk.txt awk_space.txt One, two, three. One, two, threeCopy the code

Regular expressions and operators

Awk, like SED, can also use pattern matching to match input text. Awk also supports a number of regular expression patterns, most of which are similar to the metacharacters supported by SED, and regular expressions are a must-have tool for playing the Three Swordsmen.

Awk supports regular expression metacharacters

Metacharacters that are not supported by awK by default, and metacharacters that require parameters to be added

metacharacters function The sample explain
x{m} X repeat m times /cool{5}/ X can be a string or just one character, so /cool{5}/ matches coo plus 5 l’s, coolllll. /(cool){2,}/ Matches coolcool, coolcoolcool, and so on.
x{m,} X repeats at least m times /(cool){2,}/ Same as above
x{m,n} X Indicates that the interval must be repeated at least m times but not more than n times. You need to specify the following parameters: –posix or –re-interval. This mode cannot be used without this parameter {5, 6} / / (cool) Same as above

By default, regular expressions are used to find the matching string in the line. If there is a match, the action action is executed. However, sometimes only a fixed list is required to match the specified regular expression.

Such as:

I want to take the fifth column ($5) in the /etc/passwd file and look for rows that match the mail string, which requires two more matching operators. And awK has only two operators to match regular expressions.

The regular match operator
~ An expression used to match a record or region.
! ~ Used to express the opposite of ~.

4.1 Regular Instances

1) Display the GHI column in awk.txt

[root@creditease awk]# cat awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '{print $3}' awk.txt 
GHI
GHI
GHI
[root@creditease awk]# awk -F "#" '{print $(NF-1)}' awk.txt 
GHI
GHI
GHI
Copy the code

2) Display the line containing 321

[root@creditease awk]# awk '/321/{print $0}' awk.txt 
CBA#DEF#GHI#GKL$321
Copy the code

3) Use # as a delimiter to display rows that begin with B in the first column or end with 1 in the last column

[root@creditease awk]# awk -F "#" '$1~/^B/{print $0}$NF~/1$/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code

4) Use # as a delimiter to display the first column line starting with B or C

[root@creditease awk]# awk -F "#" '$1~/^B|^C/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^[BC]/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^(B|C)/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1! ~/^A/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code

Compare expressions

Awk is a programming language that can make more complex judgments. When the condition is true, AWK performs the relevant actions, mainly to make judgments about a region, such as printing scores of 80 or more, so that the region must be compared.

The following table lists the relational operators that AWK can use to compare numeric strings, as well as regular expressions that result in 1 if the expression is true and 0 otherwise. Awk will only execute the associated action if the expression is true.

Relational operators supported by AWK

The operator meaning The sample
< Less than x>y
< = Less than or equal to. x<=y
= = Is equal to the x==y
! = Is not equal to x! =y
> = Greater than or equal to x>=y
> Is greater than x<y

5.1 Comparing expression instances

Display lines 2 and 3 of awk. TXT

NR //,//

[root@creditease awk]# awk 'NR==2{print $0}NR==3{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk 'NR>=1{print $0}' awk.txt 
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk '/BAC/,/CBA/{print $0}' awk.txt 
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code

Awk modules, variables, and implementation

The complete AWK structure diagram is as follows:

BEGIN module

The BEGIN module executes before AWK reads the file, and BEGIN mode is often used to modify the values of built-in variables like ORS, RS, FS, OFS, and so on. You can omit any input files

Awk built-in variables (predefined variables)

The variable name attribute
$0 Current record, whole line
At $1, $2,a The NTH region of the current record, separated by FS.
FS Enter the area delimiter, default is space. field separator
NF The number of regions in the current record is how many columns there are. number of field
NR The number of records that have been read, the line number, starts at 1. number of record
RS The input record separator defaults to a newline character. record separator
OFS Output area delimiter, also a space by default. output record separator
FNR Read record number of the current file, recalculated for each file.
FILENAME The name of the file being processed

Special note: FS RS supports regular expressions

2.1 First function: Define built-in variables

[root@creditease awk]# awk 'BEGIN{RS="#"}{print $0}' awk.txt 
ABC
DEF
GHI
GKLThe $123
BAC
DEF
GHI
GKLThe $213
CBA
DEF
GHI
GKLThe $321
Copy the code

2.2 Second function: Print the logo

[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}' awk.txt 
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code

2.3 AWK Provides computing functions

 [root@creditease files]# awk 'BEGIN{a=8; b=90; print a+b,a-c,a/b,a%b}'0.0888889 8 August 98Copy the code

END module

The END module is executed after AWK has read all the files, and is usually used to print a result (summation, array result). It can also be the end identifier information similar to that of the BEGIN module.

3.1 The first function: print the logo

[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}END{print "=======end======"}' awk.txt
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
=======end======
Copy the code

3.2 The second function: accumulation

1) Count blank lines (/etc/services file)

grep sed awk

[root@creditease awk]# grep "^$" /etc/services |wc -l
17
[root@creditease awk]# sed -n '/^$/p' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/{i=i+1}END{print i}' /etc/services
17
Copy the code

2

1 + 2 + 3… +100=5050. What’s awK?

[root@creditease awk]# seq 100|awk '{i=i+$0}END{print i}'
5050
Copy the code

Four, AWK detailed summary

BEGIN{}BEGIN{} or END{}END{} are both incorrect.

2, find who do what module, can be multiple.

Five, awK implementation process summary

Awk execution process:

1, command line assignment (-f or -v)

2. Execute the content in BEGIN mode

3. Start reading the file

4. Judge whether the condition (mode) is valid

  • If true, execute the content in the corresponding action
  • Read the next line and loop
  • Until you get to the end of the last file

5. Execute the contents of END mode

Awk arrays and syntax

Awk array

1.1 Array Structure

people[police]=110

people[doctor]=120

[root@creditease awk]# awk 'BEGIN{word[0]="credit"; word[1]="easy"; print word[0],word[1]}'
credit easy
[root@creditease awk]# awk 'BEGIN{word[0]="credit"; word[1]="easy"; for(i in word)print word[i]}'
credit
easy
Copy the code

1.2 Array Classification

Associative array: String subscript

1.3 AWK associative array

The existing text is in the following format: The left side is a random letter, and the right side is a random number. That is, the number following the same letter is added together and the output is in alphabetical order

a  1
b  3
c  2
d  7
b  5
a  3 
g  2
f  6
Copy the code

Array a[$1]=a[$1]+$2 (a[$1]+=$2)

[root@creditease awk]# awk '{a[$1]=a[$1]+$2}END{for(i in a)print i,a[i]}' jia.txt A 4 b 8 C 2 d 7 f 6 g 2for(i inA) The order of the loop is not processed according to the order of the text content, the sort can be added after the command sortCopy the code

1.4 AWK Index array

The numeric subscript array SEq generates numbers 1-10 and requires that only the count line be displayed

[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1; i<=NR; i+=2){print a[i]}}'One, three, five, seven, nineCopy the code

Seq generates numbers 1-10 that require that the last three lines of the file not be displayed

[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1; i<=NR-3; i++){print a[i]}}'1, 2, 3, 4, 5, 6, 7Copy the code

1.5 AWK array combat to heavy

+ and + + a

[root@creditease awk]# awk 'BEGIN{print a++}'
0
[root@creditease awk]# awk 'BEGIN{print ++a}'
1
[root@creditease awk]# awk 'BEGIN{a=1; b=a++; print a,b}'
2 1
[root@creditease awk]# awk 'BEGIN{a=1; b=++a; print a,b}'B =a++ b=a++ B =a++ B =a++ A =a+ 1 b=++ ACopy the code

Delete the following text for the second column

[root@creditease awk]# cat qc.txt 
2018/10/20   xiaoli     13373305025
2018/10/25   xiaowang   17712215986
2018/11/01   xiaoliu    18615517895 
2018/11/12   xiaoli     13373305025
2018/11/19   xiaozhao   15512013263
2018/11/26   xiaoliu    18615517895
2018/12/01   xiaoma     16965564525
2018/12/09   xiaowang   17712215986
2018/11/24   xiaozhao   15512013263
Copy the code
Solution 1: [root@creditease awk]# awk '! a[$2]++' qc.txt2018/10/20 xiaoli 13373305025 2018/10/25 xiaowang 17712215986 2018/11/01 xiaoliu 18615517895 2018/11/19 xiaozhao 2018/12/01 xiaoma 16965564525 a[$3]++ is a mode (condition). The command can also be written as awK'!
a[$3]=a[$3]+1{print $0}' qc.txt
a[$3]++, "++" after, the first value plus one! a[$3]=a[$3]+1: select a[$3], compare "! a[$3[root@creditease awk] [root@creditease awk] [root@creditease awk]# awk '++a[$2]==1' qc.txt 2018/10/20 xiaoli 13373305025 2018/10/25 xiaowang 17712215986 2018/11/01 xiaoliu 18615517895 2018/11/19 xiaozhao 15512013263 2018/12/01 xiaoma 16965564525 parse: ++a[$3]==1 is the mode (condition), also can be written as a[$3]=a[$3]+1==1$3Print the content only when the result is 1 ++a[$3], "++" before, add one before ++a[$3]==1; ==1;$3], compare "++a[$3[root@creditease awk] [root@creditease awk] [root@creditease awk]# awk '{a[$2]=$0}END{for(i in a){print a[i]}}' qc.txt2018/11/12 xiaoli 13373305025 2018/11/26 xiaoliu 18615517895 2018/12/01 xiaoma 16965564525 2018/12/09 xiaowang 17712215986 2018/11/24 xiaozhao 15512013263 note that the result of this method is to display all the non-repeating lines from the end of the textCopy the code

1.6 AWK handles multiple files (array, NR, FNR)

Use awk to fetch the first column of file.txt and the second column of file1.txt and redirect to a new file, new.txt

[root@creditease awk]# cat file1.txt 
a b
c d
e f
g h
i j
[root@creditease awk]# cat file2.txt 
1 2
3 4
5 6
7 8
9 10
[root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR! =FNR{print a[FNR],$2}' file1.txt file2.txtA 2 c 4 e 6 g 8 I 10 =FNR processes the second file. Note: When two files have different NR(lines), the file with more lines should be placed first. Solution: Put files with many lines in front and files with few lines in back. Put the output into a new file, new.txt: [root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR! =FNR{print a[FNR],$2>"new.txt"}' file1.txt file2.txt
[root@creditease awk]# cat new.txt 
a 2
c 4
e 6
g 8
i 10
Copy the code

1.7 AWK analyzes log files and counts the number of accessed websites

[root@creditease awk]# cat url.txt 
http://www.baidu.com
http://mp4.video.cn
http://www.qq.com
http://www.listeneasy.com
http://mp3.music.com
http://www.qq.com
http://www.qq.com
http://www.listeneasy.com
http://www.listeneasy.com
http://mp4.video.cn
http://mp3.music.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
[root@creditease awk]# awk -F "[/]+" '{h[$2]++}END{for(i in h) print i,h[i]}' url.txt
www.qq.com 3
www.baidu.com 5
mp4.video.cn 2
mp3.music.com 2
www.crediteasy.com 3
Copy the code

Awk simple syntax

2.1 Function sub gsub

Replace feature

Sub (r, s, target) gsub(r, s, target)

[root@creditease awk]# cat sub.txt 
ABC DEF AHI GKLThe $123
BAC DEF AHI GKLThe $213
CBA DEF GHI GKLThe $321
[root@creditease awk]# awk '{sub(/A/,"a"); print $0}' sub.txt
aBC DEF AHI GKLThe $123
BaC DEF AHI GKLThe $213
CBa DEF GHI GKLThe $321
[root@creditease awk]# awk '{gsub(/A/,"a"); print $0}' sub.txt
aBC DEF aHI GKLThe $123
BaC DEF aHI GKLThe $213
CBa DEF GHI GKLThe $321Note: sub replaces only the first inline match; Equivalent to sed's# # # 'Gsub replaces everything that matches in the line; Equivalent to sed's# # # g '
[root@creditease awk]# awk '{sub(/A/,"a",$1); print $0}' sub.txt
aBC DEF AHI GKLThe $123
BaC DEF AHI GKLThe $213
CBa DEF GHI GKLThe $321
Copy the code

Practice:

0001|20081223efskjfdj|EREADFASDLKJCV 0002|20081208djfksdaa|JDKFJALSDJFsddf 0003|20081208efskjfdj|EREADFASDLKJCV 0004 | 20081211 djfksdaa1234 | JDKFJALSDJFsddf to'|'To separate, we now want to remove the number before the letter of the second field, otherwise unchanged, output is:  0001|efskjfdj|EREADFASDLKJCV 0002|djfksdaa|JDKFJALSDJFsddf 0003|efskjfdj|EREADFASDLKJCV 0004 | djfksdaa1234 | JDKFJALSDJFsddf methods: awk - F'|'  'BEGIN{OFS="|"}{sub(/[0-9]+/,"",$2); print $0}' sub_hm.txt
awk -F '|'  -v OFS="|" '{sub(/[0-9]+/,"",$2); print $0}' sub_hm.txt
Copy the code

2.2 Usage of IF and SLSE

Content:

AA

BC

AA

CB

CC

AA

Results:

AA YES

BC NO YES

AA YES

CB NO YES

CC NO YES

AA YES

1) [root@creditease awk]# awk '{if($0~/AA/){print $0" YES"}else{print $0" NO YES"}}' ifelse.txt AA YES BC NO YES AA YES CB NO YES CC NO YES AA YES Resolution: YESifandelse.if $0If the match is AA, it is printed$0 "YES".elseWhereas the print$0 " NO YES". 2) [root @ creditease awk]# awk '$0~/AA/{print $0" YES"}$0! ~/AA/{print $0" NO YES"}' ifelse.txtAA YES BC NO YES AA YES CB NO YES CC NO YES AA YES$0If AA matches, print YES, otherwise print "NO YES"Copy the code

2.3 next usage

As above, use next to implement

Next: Skip all code after it

 [root@creditease awk]# awk '$0~/AA/{print $0" YES"; next}{print $0" NO YES"}' ifelse.txtAA YES BC NO YES AA YES CB NO YES CC NO YES AA YESprint $0" NO YES"} : This action is performed by default, on the current side$0~/AA/ matches, {print $0" YES"; Next} Because the action contains next, subsequent actions will be skipped. If meet$0~/AA/ prints YES, and the following action is not executed after next. If not$0~/AA/, will perform the action after next; Next (pattern match), the following is not executed, the preceding (pattern mismatch), the following is executed.Copy the code

2.4 Printf not newline output and next usage

Printf: no line breaks after printing

The following text, if Description: is empty, is merged with the following line.

Packages: Hello-1
Owner: me me me me
Other: who care?
Description:
Hello world!
Other2: donPackages: hello-1 Owner: me me me Other: Who care Description: Hello world! Origial-Owner: me me me me Other2: don'T Care 1) [root@creditease awk]# awk '/^Desc.*:$/{printf $0}! /Desc.*:$/{print $0}' printf.txt
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: don't care ';/^Desc.*:$/printfPrint (without newline), mismatches print the entire line. 2) useifandelseImplementation [root @ creditease awk]# awk '{if(/Des.*:$/){printf $0}else{print $0}}' printf.txt 
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: donDon't care 3) Implement next [root@creditease awk]# awk '/Desc.*:$/{printf $0; next}{print $0}' printf.txt Packages: Hello-1 Owner: me me me me Other: who care? Description:Hello world! Other2: don'T care Note: Can be shortened to AWK'/Desc.*:$/{printf $0; next}1'
printf.txt  {print $0} {print $0}
Copy the code

2.5 Counting After deduplication Redirects files to specified files as required

The following text asks you to calculate the number of repetitions for each item, and then put the items with more than 2 repetitions into the gt2. TXT file, and those with less than or equal to 2 repetitions into the le2.txt file

[root@creditease files]# cat qcjs.txt 
aaa
bbb
ccc
aaa
ddd
bbb
rrr
ttt
ccc
eee
ddd
rrr
bbb
rrr
bbb
[root@creditease awk]# awk '{a[$1]++}END{for(i in a){if(a[i]>2){print i,a[i]>"gt2.txt"}else{print i,a[i]>"le2.txt"}}}' qcjs.txt 
[root@creditease awk]# cat gt2.txt 
rrr 3
bbb 4
[root@creditease awk]# cat le2.txt Aaa 2 CCC 2 Eee 1 TTT 1 DDD 2print}, or printed in parentheses can be directly redirected to a new file, the file name is enclosed in double quotation marks. Such as: {print The $1 >"xin.txt"}
Copy the code

Iii. Precautions for AWK

A)NR==FNR ##

b)NR! =FNR ##NR is not equal to FNR

c){a=1; A [NR]} the variable and array names cannot be the same in the same command

E){print}, or printed in parentheses can be directly redirected to a new file, the name of the file is enclosed in double quotation marks. {print $1 >” xin-.txt “}

F) When mode (condition) is 0, the following action will not be executed. When 0, the next action is executed.

Author: Qin Wei

Source: Creditease Institute of Technology