preface

AWK is an interpreted programming language. Used for text processing, it takes its name from the surnames of its three authors: Alfred Aho, Peter Weinberger, and Brian Kernighan

  • Awk program structure
  • Run the AWK file script
  • Awk basic syntax
  • Built-in variables commonly used by AWK programs

Pay attention to the public account, communicate together, wechat search: sneak forward

Program structure

Awk command mode:

  • awk ' BEGIN {awk-commands} /pattern/ {awk-commands} END {awk-commands}' fileName
  • If there is a matching pattern, use/include
  • Awk-commands program code blocks must be bracketed
  • The BEGIN blockBEGIN {awk-commands}.optional, which is executed only once, where variables can be initialized. BEGIN is the keyword of AWK and must be uppercase
  • The BODY block/pattern/ {awk-commands}Commands in the BODY block execute each line of input, and you can control this behavior by providing patterns
  • END blockEND {awk-commands}optionalThe END block is executed at the END of the program. END is the AWK keyword and must be uppercase

Awk workflow

The BODY block performs parsing

Script command: awk ‘{[code statement 1][code statement 2]}; If there is no fileName or other input stream and a BODY block exists, the BODY block goes into an infinite loop. Code statement expressions end with a semicolon or a newline character

  • 1: Reads a line of data and fills it with $0; Fill in $1, $2…. for each column Among the equal variables
  • 2: execute code statements
  • 3: If there are further rows of data, repeat steps 1 to 2 until each row is read

Run the AWK file script

  • Awk file scripts end with the awk suffix
  • Option [-] f:awk -f command.awk marks.txt

Awk basic syntax

  • Awk variables do not need to be defined in advance and cannot be typed
awk 'BEGIN{sum=1; print sum}'
1
Copy the code
  • Process control
# -- -- -- -- -- -- -- -- pseudo code1  ---------
if({condition}) code logic...else if({condition}) code logic...elseCode logic... # -- -- -- -- -- -- -- -- pseudo code2  ---------
for({initialize}; {condition}; {subsequent logic}){code logic... } #-------- pseudocode3  ---------
while({condition}){code logic... } #-------- pseudocode4  ---------
do{code logic... }while ({condition})    
Copy the code
  • Operators, which are basically the same as the Java programming language
symbol instructions The sample
^ Exponential operator a = a ^ 2
-/+ Unary operator a = -10; a = +a;
condition ? action : action Ternary operator (a > b) ? max = a : max = b;
&& / | | Logical operator if (num >= 0 && num <= 7)
= = /! = Equal to or not equal to if (a == b)
awk 'BEGIN{sum=1; sum++; if(sum==2) print sum}'
2
Copy the code
  • Arrays, AWK supports associative arrays, that is, you can use not only numerically indexed arrays, but also strings as indexes; Delete an array element using the DELETE statementdelete arr[0]
$ awk 'BEGIN {arr["lwl"] = 1; arr["csc"] = 2; for (i in arr) printf "arr[%s] = %d\n", i, arr[i]}'
arr[lwl] = 1
arr[csc] = 2
Copy the code
  • String manipulation
---- is a space concatenation character. The default concatenation character is ---- awk'BEGIN { str1 = "csc, "; str2 = "lwl"; str3 = str1 str2; print str3 }'
csc, lwl
Copy the code
  • String-related built-in functions
index(STR, sub) # get sub at STR start indexlength(STR) # get the STR lengthmatch(str, regex) # STR matches the regex pattern
split(str, arr, regex)
sub(regex, sub, string)
substr(str, start, l)
tolower(str)
toupper(str)
Copy the code

Regular expression

  • Matching characters: ~ and! ~ indicates match and mismatch respectively
$ awk '$0! ~ 9 ' marks.txt
1) Amit     Physics   80
3) Shyam    Biology   87
Copy the code
  • Card characters and regular expressions
# log.txt Content file
1 csc world
2LWL hello -- -- -- -- -- -- -- -- -- -- the second column contains the LWL line -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- $awk'$2 ~ /lwl/ {print $2,$3}'The TXT LWL hello -- -- -- -- -- - contains the CSC line -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- $awk'/csc/ {print $0}' log.txt
1 csc world
Copy the code

Built-in variables commonly used by AWK programs

variable describe
$n The NTH field of the current record, separated by FS
$0 Complete input record
ARGC Number of command line arguments
ARGV An array containing command line arguments
ENVIRON The environment variable
ERRNO The last description of a system error
FILENAME Current file name
FS Field separator (default is any space)
IGNORECASE Performs case-insensitive matching
NF The number of fields in a record
NR The number of records that have been read, the line number, starts at 1
FNR Similar to NR, except that if there are multiple input files, FNR is the line number of the current file
OFS Output field delimiters
ORS Output line delimiters
RLENGTH The length of the string to be matched by the match function
RS Record separator (default is a newline character)
RSTART The first position in the string to be matched by the match function
ARGIND The index of ARGV currently being processed when the data is looping through
PROCINFO An associative array containing process information, such as UID, process ID, etc
  • ARGV Number of command line arguments
$ awk 'BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %s\n", i, ARGV[i] } }' csc lwl 
ARGV[0] = csc
ARGV[1] = lwl
Copy the code
  • ENVIRON environment variable
$ awk 'BEGIN { print ENVIRON["USER"] }'
csc
Copy the code
  • FILENAME indicates the current FILENAME.
$ awk 'END {print FILENAME}' test.txt
test.txt
Copy the code
  • RSTART, the first position in the string to be matched by the match function
$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }
9
Copy the code

Corrections are welcome

Refer to the article

  • Linux the awk command
  • Learn AWK in 30 minutes