The first awK introduction and expression examples
-
A language with a weird name
-
Patterns scan and process, process data and generate reports.
Awk is not just a command in Linux, but a programming language; It can be used to process data and generate reports (Excel); The data processed can be one or more files; It can be directly from standard input, or it can be piped from standard input. Awk can be operated by editing commands directly from the command line, or it can be written into AWK programs for more complex use.
Sed processes the stream editor.
Introduction to AWK environment
The AWK covered in this article is GAWK, the GNU version of AWK.
[root@creditease awk]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@creditease awk]# uname -r3.10.0-862. El7. X86_64 [root @ creditease awk]# ll `which awk`
lrwxrwxrwx. 1 root root 4 Nov 7 14:47 /usr/bin/awk -> gawk
[root@creditease awk]# awk --versionThe GNU Awk 4.0.2Copy the code
Second, awK format
Awk commands are composed of patterns, actions, or combinations of patterns and actions.
-
The pattern is similar to the pattern matching known as SED. It can be composed of expressions or regular expressions between two forward slashes. NR==1, that’s the pattern, you can think of it as a condition.
-
An action, or action, consists of one or more statements enclosed in braces, separated by semicolons. The following awK format is used.
Records and domains
The name of the | meaning |
---|---|
record | Line records, |
filed | Field, region, field, column |
1) NF (number of fields) indicates the number of fields (columns) in a row. $NF takes the last field.
2) the $sign indicates that a column (region), $1,NF
3) Number of Record (NR) row number. Awk has a built-in variable NR to save the record number of each row. After each processing, the value of NR will automatically +1
4) FS (-f) field separator, what is used to separate rows into multiple columns
3.1 Specifying the delimiter
[root@creditease awk]# awk -F "#" '{print $NF}' awk.txt
GKLThe $123
GKLThe $213
GKLThe $321
[root@creditease awk]# awk -F '[#$]' '{print $NF}' awk.txt
123
213
321
Copy the code
3.2 Conditional Actions Basic conditions and actions
[root@creditease awk]# cat awk.txt
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" 'NR==1{print $1}' awk.txt
ABC
Copy the code
3.3 Conditions Only
[root@creditease awk]# awk -F "#" 'NR==1' awk.txt
ABC#DEF#GHI#GKL$123
Copy the code
Default action {print $0}
3.4 Action Only
[root@creditease awk]# awk -F "#" '{print $1}' awk.txt
ABC
BAC
CAB
Copy the code
All rows are processed by default
3.5 Multiple modes and Actions
[root@creditease awk]# awk -F "#" 'NR==1{print $NF}NR==3{print $NF}' awk.txt
GKLThe $123
GKLThe $321
Copy the code
3.6 Understanding of $0
$0 in awk represents the entire line
[root@creditease awk]# awk '{print $0}' awk_space.txt
ABC DEF GHI GKLThe $123
BAC DEF GHI GKLThe $213
CBA DEF GHI GKLThe $321
Copy the code
3.7 FNR
FNR is similar to NR, except that multi-file records are not incremented, and each file starts at 1.
[root@creditease awk]# awk '{print NR}' awk.txt awk_space.txt
1
2
3
4
5
6
[root@creditease awk]# awk '{print FNR}' awk.txt awk_space.txt One, two, three. One, two, threeCopy the code
Regular expressions and operators
Awk, like SED, can also use pattern matching to match input text. Awk also supports a number of regular expression patterns, most of which are similar to the metacharacters supported by SED, and regular expressions are a must-have tool for playing the Three Swordsmen.
Awk supports regular expression metacharacters
Metacharacters that are not supported by awK by default, and metacharacters that require parameters to be added
metacharacters | function | The sample | explain |
---|---|---|---|
x{m} | X repeat m times | /cool{5}/ | X can be a string or just one character, so /cool{5}/ matches coo plus 5 l’s, coolllll. /(cool){2,}/ Matches coolcool, coolcoolcool, and so on. |
x{m,} | X repeats at least m times | /(cool){2,}/ | Same as above |
x{m,n} | X Indicates that the interval must be repeated at least m times but not more than n times. You need to specify the following parameters: –posix or –re-interval. This mode cannot be used without this parameter | {5, 6} / / (cool) | Same as above |
By default, regular expressions are used to find the matching string in the line. If there is a match, the action action is executed. However, sometimes only a fixed list is required to match the specified regular expression.
Such as:
I want to take the fifth column ($5) in the /etc/passwd file and look for rows that match the mail string, which requires two more matching operators. And awK has only two operators to match regular expressions.
The regular match operator | |
---|---|
~ | An expression used to match a record or region. |
! ~ | Used to express the opposite of ~. |
4.1 Regular Instances
1) Display the GHI column in awk.txt
[root@creditease awk]# cat awk.txt
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '{print $3}' awk.txt
GHI
GHI
GHI
[root@creditease awk]# awk -F "#" '{print $(NF-1)}' awk.txt
GHI
GHI
GHI
Copy the code
2) Display the line containing 321
[root@creditease awk]# awk '/321/{print $0}' awk.txt
CBA#DEF#GHI#GKL$321
Copy the code
3) Use # as a delimiter to display rows that begin with B in the first column or end with 1 in the last column
[root@creditease awk]# awk -F "#" '$1~/^B/{print $0}$NF~/1$/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code
4) Use # as a delimiter to display the first column line starting with B or C
[root@creditease awk]# awk -F "#" '$1~/^B|^C/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^[BC]/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1~/^(B|C)/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk -F "#" '$1! ~/^A/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code
Compare expressions
Awk is a programming language that can make more complex judgments. When the condition is true, AWK performs the relevant actions, mainly to make judgments about a region, such as printing scores of 80 or more, so that the region must be compared.
The following table lists the relational operators that AWK can use to compare numeric strings, as well as regular expressions that result in 1 if the expression is true and 0 otherwise. Awk will only execute the associated action if the expression is true.
Relational operators supported by AWK
The operator | meaning | The sample |
---|---|---|
< | Less than | x>y |
< = | Less than or equal to. | x<=y |
= = | Is equal to the | x==y |
! = | Is not equal to | x! =y |
> = | Greater than or equal to | x>=y |
> | Is greater than | x<y |
5.1 Comparing expression instances
Display lines 2 and 3 of awk. TXT
NR //,//
[root@creditease awk]# awk 'NR==2{print $0}NR==3{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk 'NR>=1{print $0}' awk.txt
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
[root@creditease awk]# awk '/BAC/,/CBA/{print $0}' awk.txt
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code
Awk modules, variables, and implementation
The complete AWK structure diagram is as follows:
BEGIN module
The BEGIN module executes before AWK reads the file, and BEGIN mode is often used to modify the values of built-in variables like ORS, RS, FS, OFS, and so on. You can omit any input files
Awk built-in variables (predefined variables)
The variable name | attribute |
---|---|
$0 | Current record, whole line |
At $1, $2,a | The NTH region of the current record, separated by FS. |
FS | Enter the area delimiter, default is space. field separator |
NF | The number of regions in the current record is how many columns there are. number of field |
NR | The number of records that have been read, the line number, starts at 1. number of record |
RS | The input record separator defaults to a newline character. record separator |
OFS | Output area delimiter, also a space by default. output record separator |
FNR | Read record number of the current file, recalculated for each file. |
FILENAME | The name of the file being processed |
Special note: FS RS supports regular expressions
2.1 First function: Define built-in variables
[root@creditease awk]# awk 'BEGIN{RS="#"}{print $0}' awk.txt
ABC
DEF
GHI
GKLThe $123
BAC
DEF
GHI
GKLThe $213
CBA
DEF
GHI
GKLThe $321
Copy the code
2.2 Second function: Print the logo
[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}' awk.txt
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
Copy the code
2.3 AWK Provides computing functions
[root@creditease files]# awk 'BEGIN{a=8; b=90; print a+b,a-c,a/b,a%b}'0.0888889 8 August 98Copy the code
END module
The END module is executed after AWK has read all the files, and is usually used to print a result (summation, array result). It can also be the end identifier information similar to that of the BEGIN module.
3.1 The first function: print the logo
[root@creditease awk]# awk 'BEGIN{print "=======start======"}{print $0}END{print "=======end======"}' awk.txt
=======start======
ABC#DEF#GHI#GKL$123
BAC#DEF#GHI#GKL$213
CBA#DEF#GHI#GKL$321
=======end======
Copy the code
3.2 The second function: accumulation
1) Count blank lines (/etc/services file)
grep sed awk
[root@creditease awk]# grep "^$" /etc/services |wc -l
17
[root@creditease awk]# sed -n '/^$/p' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/' /etc/services |wc -l
17
[root@creditease awk]# awk '/^$/{i=i+1}END{print i}' /etc/services
17
Copy the code
2
1 + 2 + 3… +100=5050. What’s awK?
[root@creditease awk]# seq 100|awk '{i=i+$0}END{print i}'
5050
Copy the code
Four, AWK detailed summary
BEGIN{}BEGIN{} or END{}END{} are both incorrect.
2, find who do what module, can be multiple.
Five, awK implementation process summary
Awk execution process:
1, command line assignment (-f or -v)
2. Execute the content in BEGIN mode
3. Start reading the file
4. Judge whether the condition (mode) is valid
- If true, execute the content in the corresponding action
- Read the next line and loop
- Until you get to the end of the last file
5. Execute the contents of END mode
Awk arrays and syntax
Awk array
1.1 Array Structure
people[police]=110
people[doctor]=120
[root@creditease awk]# awk 'BEGIN{word[0]="credit"; word[1]="easy"; print word[0],word[1]}'
credit easy
[root@creditease awk]# awk 'BEGIN{word[0]="credit"; word[1]="easy"; for(i in word)print word[i]}'
credit
easy
Copy the code
1.2 Array Classification
Associative array: String subscript
1.3 AWK associative array
The existing text is in the following format: The left side is a random letter, and the right side is a random number. That is, the number following the same letter is added together and the output is in alphabetical order
a 1
b 3
c 2
d 7
b 5
a 3
g 2
f 6
Copy the code
Array a[$1]=a[$1]+$2 (a[$1]+=$2)
[root@creditease awk]# awk '{a[$1]=a[$1]+$2}END{for(i in a)print i,a[i]}' jia.txt A 4 b 8 C 2 d 7 f 6 g 2for(i inA) The order of the loop is not processed according to the order of the text content, the sort can be added after the command sortCopy the code
1.4 AWK Index array
The numeric subscript array SEq generates numbers 1-10 and requires that only the count line be displayed
[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1; i<=NR; i+=2){print a[i]}}'One, three, five, seven, nineCopy the code
Seq generates numbers 1-10 that require that the last three lines of the file not be displayed
[root@creditease awk]# seq 10|awk '{a[NR]=$0}END{for(i=1; i<=NR-3; i++){print a[i]}}'1, 2, 3, 4, 5, 6, 7Copy the code
1.5 AWK array combat to heavy
+ and + + a
[root@creditease awk]# awk 'BEGIN{print a++}'
0
[root@creditease awk]# awk 'BEGIN{print ++a}'
1
[root@creditease awk]# awk 'BEGIN{a=1; b=a++; print a,b}'
2 1
[root@creditease awk]# awk 'BEGIN{a=1; b=++a; print a,b}'B =a++ b=a++ B =a++ B =a++ A =a+ 1 b=++ ACopy the code
Delete the following text for the second column
[root@creditease awk]# cat qc.txt
2018/10/20 xiaoli 13373305025
2018/10/25 xiaowang 17712215986
2018/11/01 xiaoliu 18615517895
2018/11/12 xiaoli 13373305025
2018/11/19 xiaozhao 15512013263
2018/11/26 xiaoliu 18615517895
2018/12/01 xiaoma 16965564525
2018/12/09 xiaowang 17712215986
2018/11/24 xiaozhao 15512013263
Copy the code
Solution 1: [root@creditease awk]# awk '! a[$2]++' qc.txt2018/10/20 xiaoli 13373305025 2018/10/25 xiaowang 17712215986 2018/11/01 xiaoliu 18615517895 2018/11/19 xiaozhao 2018/12/01 xiaoma 16965564525 a[$3]++ is a mode (condition). The command can also be written as awK'!
a[$3]=a[$3]+1{print $0}' qc.txt
a[$3]++, "++" after, the first value plus one! a[$3]=a[$3]+1: select a[$3], compare "! a[$3[root@creditease awk] [root@creditease awk] [root@creditease awk]# awk '++a[$2]==1' qc.txt 2018/10/20 xiaoli 13373305025 2018/10/25 xiaowang 17712215986 2018/11/01 xiaoliu 18615517895 2018/11/19 xiaozhao 15512013263 2018/12/01 xiaoma 16965564525 parse: ++a[$3]==1 is the mode (condition), also can be written as a[$3]=a[$3]+1==1$3Print the content only when the result is 1 ++a[$3], "++" before, add one before ++a[$3]==1; ==1;$3], compare "++a[$3[root@creditease awk] [root@creditease awk] [root@creditease awk]# awk '{a[$2]=$0}END{for(i in a){print a[i]}}' qc.txt2018/11/12 xiaoli 13373305025 2018/11/26 xiaoliu 18615517895 2018/12/01 xiaoma 16965564525 2018/12/09 xiaowang 17712215986 2018/11/24 xiaozhao 15512013263 note that the result of this method is to display all the non-repeating lines from the end of the textCopy the code
1.6 AWK handles multiple files (array, NR, FNR)
Use awk to fetch the first column of file.txt and the second column of file1.txt and redirect to a new file, new.txt
[root@creditease awk]# cat file1.txt
a b
c d
e f
g h
i j
[root@creditease awk]# cat file2.txt
1 2
3 4
5 6
7 8
9 10
[root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR! =FNR{print a[FNR],$2}' file1.txt file2.txtA 2 c 4 e 6 g 8 I 10 =FNR processes the second file. Note: When two files have different NR(lines), the file with more lines should be placed first. Solution: Put files with many lines in front and files with few lines in back. Put the output into a new file, new.txt: [root@creditease awk]# awk 'NR==FNR{a[FNR]=$1}NR! =FNR{print a[FNR],$2>"new.txt"}' file1.txt file2.txt
[root@creditease awk]# cat new.txt
a 2
c 4
e 6
g 8
i 10
Copy the code
1.7 AWK analyzes log files and counts the number of accessed websites
[root@creditease awk]# cat url.txt
http://www.baidu.com
http://mp4.video.cn
http://www.qq.com
http://www.listeneasy.com
http://mp3.music.com
http://www.qq.com
http://www.qq.com
http://www.listeneasy.com
http://www.listeneasy.com
http://mp4.video.cn
http://mp3.music.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
http://www.baidu.com
[root@creditease awk]# awk -F "[/]+" '{h[$2]++}END{for(i in h) print i,h[i]}' url.txt
www.qq.com 3
www.baidu.com 5
mp4.video.cn 2
mp3.music.com 2
www.crediteasy.com 3
Copy the code
Awk simple syntax
2.1 Function sub gsub
Replace feature
Sub (r, s, target) gsub(r, s, target)
[root@creditease awk]# cat sub.txt
ABC DEF AHI GKLThe $123
BAC DEF AHI GKLThe $213
CBA DEF GHI GKLThe $321
[root@creditease awk]# awk '{sub(/A/,"a"); print $0}' sub.txt
aBC DEF AHI GKLThe $123
BaC DEF AHI GKLThe $213
CBa DEF GHI GKLThe $321
[root@creditease awk]# awk '{gsub(/A/,"a"); print $0}' sub.txt
aBC DEF aHI GKLThe $123
BaC DEF aHI GKLThe $213
CBa DEF GHI GKLThe $321Note: sub replaces only the first inline match; Equivalent to sed's# # # 'Gsub replaces everything that matches in the line; Equivalent to sed's# # # g '
[root@creditease awk]# awk '{sub(/A/,"a",$1); print $0}' sub.txt
aBC DEF AHI GKLThe $123
BaC DEF AHI GKLThe $213
CBa DEF GHI GKLThe $321
Copy the code
Practice:
0001|20081223efskjfdj|EREADFASDLKJCV 0002|20081208djfksdaa|JDKFJALSDJFsddf 0003|20081208efskjfdj|EREADFASDLKJCV 0004 | 20081211 djfksdaa1234 | JDKFJALSDJFsddf to'|'To separate, we now want to remove the number before the letter of the second field, otherwise unchanged, output is: 0001|efskjfdj|EREADFASDLKJCV 0002|djfksdaa|JDKFJALSDJFsddf 0003|efskjfdj|EREADFASDLKJCV 0004 | djfksdaa1234 | JDKFJALSDJFsddf methods: awk - F'|' 'BEGIN{OFS="|"}{sub(/[0-9]+/,"",$2); print $0}' sub_hm.txt
awk -F '|' -v OFS="|" '{sub(/[0-9]+/,"",$2); print $0}' sub_hm.txt
Copy the code
2.2 Usage of IF and SLSE
Content:
AA
BC
AA
CB
CC
AA
Results:
AA YES
BC NO YES
AA YES
CB NO YES
CC NO YES
AA YES
1) [root@creditease awk]# awk '{if($0~/AA/){print $0" YES"}else{print $0" NO YES"}}' ifelse.txt AA YES BC NO YES AA YES CB NO YES CC NO YES AA YES Resolution: YESifandelse.if $0If the match is AA, it is printed$0 "YES".elseWhereas the print$0 " NO YES". 2) [root @ creditease awk]# awk '$0~/AA/{print $0" YES"}$0! ~/AA/{print $0" NO YES"}' ifelse.txtAA YES BC NO YES AA YES CB NO YES CC NO YES AA YES$0If AA matches, print YES, otherwise print "NO YES"Copy the code
2.3 next usage
As above, use next to implement
Next: Skip all code after it
[root@creditease awk]# awk '$0~/AA/{print $0" YES"; next}{print $0" NO YES"}' ifelse.txtAA YES BC NO YES AA YES CB NO YES CC NO YES AA YESprint $0" NO YES"} : This action is performed by default, on the current side$0~/AA/ matches, {print $0" YES"; Next} Because the action contains next, subsequent actions will be skipped. If meet$0~/AA/ prints YES, and the following action is not executed after next. If not$0~/AA/, will perform the action after next; Next (pattern match), the following is not executed, the preceding (pattern mismatch), the following is executed.Copy the code
2.4 Printf not newline output and next usage
Printf: no line breaks after printing
The following text, if Description: is empty, is merged with the following line.
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:
Hello world!
Other2: donPackages: hello-1 Owner: me me me Other: Who care Description: Hello world! Origial-Owner: me me me me Other2: don'T Care 1) [root@creditease awk]# awk '/^Desc.*:$/{printf $0}! /Desc.*:$/{print $0}' printf.txt
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: don't care ';/^Desc.*:$/printfPrint (without newline), mismatches print the entire line. 2) useifandelseImplementation [root @ creditease awk]# awk '{if(/Des.*:$/){printf $0}else{print $0}}' printf.txt
Packages: Hello-1
Owner: me me me me
Other: who care?
Description:Hello world!
Other2: donDon't care 3) Implement next [root@creditease awk]# awk '/Desc.*:$/{printf $0; next}{print $0}' printf.txt Packages: Hello-1 Owner: me me me me Other: who care? Description:Hello world! Other2: don'T care Note: Can be shortened to AWK'/Desc.*:$/{printf $0; next}1'
printf.txt {print $0} {print $0}
Copy the code
2.5 Counting After deduplication Redirects files to specified files as required
The following text asks you to calculate the number of repetitions for each item, and then put the items with more than 2 repetitions into the gt2. TXT file, and those with less than or equal to 2 repetitions into the le2.txt file
[root@creditease files]# cat qcjs.txt
aaa
bbb
ccc
aaa
ddd
bbb
rrr
ttt
ccc
eee
ddd
rrr
bbb
rrr
bbb
[root@creditease awk]# awk '{a[$1]++}END{for(i in a){if(a[i]>2){print i,a[i]>"gt2.txt"}else{print i,a[i]>"le2.txt"}}}' qcjs.txt
[root@creditease awk]# cat gt2.txt
rrr 3
bbb 4
[root@creditease awk]# cat le2.txt Aaa 2 CCC 2 Eee 1 TTT 1 DDD 2print}, or printed in parentheses can be directly redirected to a new file, the file name is enclosed in double quotation marks. Such as: {print The $1 >"xin.txt"}
Copy the code
Iii. Precautions for AWK
A)NR==FNR ##
b)NR! =FNR ##NR is not equal to FNR
c){a=1; A [NR]} the variable and array names cannot be the same in the same command
E){print}, or printed in parentheses can be directly redirected to a new file, the name of the file is enclosed in double quotation marks. {print $1 >” xin-.txt “}
F) When mode (condition) is 0, the following action will not be executed. When 0, the next action is executed.
Author: Qin Wei
Source: Creditease Institute of Technology