This is my second article about getting started

Writing code has something in common with writing articles, so turn your ideas into code/articles and enjoy the thought process.

background

Awk was mentioned in the log analysis section in the previous article on Linux operations and maintenance scenarios, so this article will fill in the blanks in case you leave too much time

Awk is very convenient for text processing on Linux, and some of you may find it easier to use Python for text processing, which is true. I do a lot of text processing in Python, but it’s very convenient to use pipes for simple text processing

This article introduces the concepts of AWK and introduces the basic uses of awK with some common scenarios. I can’t guarantee that all students will enjoy this operation, but AWK will definitely help you to be more efficient when processing text on Linux

Introduction to the

What is AWK? What is awK

What is the awk

Awk is a programming language used to process text and data line by line in Linux. Awk is more focused on slicing text than grep lookup, sed editing, and now almost all Linux distributions have awk built-in

Awk is a programming language that processes textual data in line slicing.

This section describes how AWK works

An overview of a process that awK runs (skip if not interested)

The basic structure

The basic structure of an AWK script looks like this

awk [options] 'BEGIN{ commands } pattern{ commands } END{ commands }' fileName

Common options include:

-f progfile, --file=progfile: executes from the file -f fs, --field-separator=fs: specifies the field separator to be entered. The default is ** whitespace (including Spaces, TAB characters) ** -vvar= val, -- the assign =var=val: Definition variable before code executionCopy the code

The working principle of

  1. Gets the option content and executes it, such as specifying a specific separator
  2. Execute the statements in the BEGIN{commands} statement block. The optional BEGIN statement block is executed before the line is read, such as some variable initialization.
  3. Read each line from the file and execute the PATTERN {commands} statement block, which scans the file line by line, executing the logic on each line until the file has been processed.
  4. The END{commands} block is executed at the END of the input stream when all the rows have been read and executed. For example, some print analysis results can be written in here

Common Usage analysis

The above may be complicated. In fact, most of the shell operations are used, especially in the log analysis scenario. The common format is as follows

awk {commands} filename
Copy the code

Awk refers to the prefix commands refers to the code that is actually executed filename is the name of the file that needs to be operated on and can be multiple

Built-in variables

Awk has many built-in variables, the common ones are as follows

$0: the current record (containing the entire row) 1-n: the NTH field of the current record, which is separated by the above separator FS: the default is space or TAB NF: the number of columns in the current record NR: FNR: indicates the current number of records. Unlike NR, this value is the line number of each file. RS: indicates the input record separator, which is a newline by default. The output record separator, which defaults to newline FILENAME: the name of the file currently enteredCopy the code

For example, we have a.txt, which reads as follows

Xiao Ming car man1234Little Red train girl2234Little black plane guy3234Little white bike guy4234Little green walking man4234
Copy the code

Example 1

$ awk '{print $0}' a.txt
Copy the code

Example: a.txt is the text file we are working with. The single quotation mark is preceded by curly braces, inside which is the processing action print $0 for each line. That’s printing every line in the file.

Example 2

awk -F ':' '{print $1, $(NF-1)}' a.txt
Copy the code
Little Ming man, little red woman, little black man, little white man, little green manCopy the code

Example: NF is the number of columns, and $(Nf-1) is the penultimate column, which prints the first and penultimate columns

Example 3

awk -F ':' '{print NR ": " $2}' a.txt
Copy the code
1: the car2: the train3: the plane4: bicycle5Walk:Copy the code

Example: NR indicates which line is being processed. To print the original character in it, enclose it in double quotation marks

An array of

In AWK, arrays are special, they’re associative arrays, their subscripts are string values, and unlike in other languages, even if you’re using a number, AWK implicitly converts the subscripts to strings, a little bit like a map so you just have to remember that an array in AWK is a map

All arrays in AWK are associative, i.e. indexed by string values

Array assignment

array[index]=value
Copy the code

Array traversal

Can be used for.. in.. Syntax, where item is the index of the array element, that is, the array key

for (item in array)
Copy the code

Array inclusion judgment

You can use the in operator in the if branch to determine if an element exists

if (item in array)
Copy the code

The sample

Again, take the text a.txt above

# awk '{ a[$1]=$3; } END {print "printf" in a for (I in a) printf "%s: %s\n", I, a[I]; }' a.txt # output 1 // contains the element white: black: male red: female White: male green: male Ming: maleCopy the code

statements

Printf, delete, break, continue, exit, next, etc. In addition to printf, which calls parenthesis arguments, Nothing else

# echo 1 | awk '{printf("%s - %s\n", "juejin", "I love U")}'
juejin - I love U
Copy the code

Conditional statements

Like other languages, AWk supports if, if-else, if-else if syntax.

# single instructionif(condition) command # multiple commandsif(conditional) { command1; command2; . }Copy the code

Again, let’s give a simple example: filter names that are female

# awk '{if($3 == "female ") print$1}' a.txtCopy the code

Looping statements

Awk loop statements support for and while (while and do… While…) , usage and in C language similar, I believe that we have been more familiar with, here is only a simple explanation

The for loop

forInitialization; Boolean conditional expression; {commands... }Copy the code

The while loop

whileGrammar:while(Boolean expressions){commands... }do whileGrammar:do{commands... }while(conditions)Copy the code

In view of this we are more familiar with, not one example

function

Awk function is also very rich, basic can meet our normal use, here mainly introduces three types of functions: string function, mathematical function and custom function

String function

Awk covers all the common string manipulation functions. Here I’ll briefly list the common ones, along with some simple examples

  • Length ([s]): Returns the length of the string. If s is not specified, $0 is used by default
# echo "juejin" | awk '{print length(); 6} 'Copy the code
  • Index (STR, target): returns the position at which the string appears in STR. Note that the position is calculated from 1, and 0 is returned if not found
# awk 'BEGIN {print index("juejin", "j")}'
1
# awk 'BEGIN {print index("juejin", "0")}'
0
Copy the code
  • Match (s, ere) : Returns the starting position of the string s matching ere, or 0 if there is no match. This function defines two built-in variables, RSTART and RLENGTH. RSTART is the same as the return value, and RLENGTH records the length of the matching substring, or -1 if not
awk 'BEGIN {
print match("juejinjin", /jinjin/);
printf "Matched at: %d, Matched substr length: %d\n", RSTART, RLENGTH;
}'
4
Matched at: 4, Matched substr length: 6
Copy the code
  • Tolower (s) : Converts strings to lowercase characters. For example,
# awk 'BEGIN {print tolower("JUEJIN"); }' juejinCopy the code
  • Toupper (s) : Converts strings to uppercase characters. For example,
# awk 'BEGIN {print toupper("juejin"); }' JUEJINCopy the code
  • Substr (s, m, [n]) : string of characters. Returns a string of length N starting at position m, also starting at position 1, and counting to the end of the string if n is not specified or the value of n is greater than the number of remaining characters. Such as:
# awk 'BEGIN { print substr("juejin", 2, 4); }'
ueji
Copy the code

Digital function

Here are a few commonly used ones

Sine (x) : sine function; Cosine (x) : cosine function; Exp (x) : exponential function based on the natural logarithm e; Log (x) : log base e; SQRT (x) : square root function; Int (x) : Converts a value to an integer; The rand () : returns0to1A random value of1;Copy the code

Custom function

In addition to the system’s custom functions described above, users can customize functions as needed

function function_name(argument1, argument2, ...)
{
    function body
}
Copy the code

Argument1, Argument2 are lists of arguments, separated by commas. Arguments are local variables that cannot be accessed outside the function, but variables defined in the function are global variables that can be accessed outside the function

# echo line | awk ' function find_min(a, b){ c = a+b if (a < b) return a; return b; } {printf("pre c=%d\n",c) print find_min(1,2); printf("after c=%d",c) }' pre c=0 1 after c=3Copy the code

In addition to the above functions, another function that might be used is system, which is mainly used to execute external commands

# awk 'BEGIN {system("uname -m"); }' x86_64Copy the code

Usage scenarios

Having covered the basics, let’s take a look at some of the practical uses. The combination of the scene you can understand faster

statistical

Here are two of the most common commands I use myself

Add up a list of numbers

ls -l *.go *.conf *.sh | awk '{sum+=$5} END {print sum}'

Count how much memory each user’s process occupies

ps aux | awk 'NR! =1{a[$1]+=$6; } END { for(i in a) print i ", " a[i]"KB"; } '

Filter all IPv4 addresses except lo network adapter from the ifconfig command

ifconfig | awk '/inet / && ! ($2 ~ /^127/){print $2}'

Log analysis

Suppose you have an Nginx service that prints some access log access.log

Log format:  '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'Copy the code

Counting the number of IP addresses accessed:

# awk '{a[$1]++}END{for(v in a)print v,a[v]}' access.log
Copy the code

Count the number of IP access times and sort the top 10:

# awk '{a[$1]++}END{for(v in a)print v,a[v]|"sort -k2 -nr |head -10"}' access.log
Copy the code

Statistics access IP addresses that are accessed more than 100 times:

# awk '{a[$1]++}END{for(v ina){if(a[v]>100)print v,a[v]}}' access.log
Copy the code

Top 10 most visited pages:

# awk '{a[$7]++}END{for(vin a)print v,a[v]|"sort -k1 -nr|head -n10"}' access.log
Copy the code

Number of access status codes for each IP address:

# awk '{a[$1" "$9]++}END{for(v ina)print v,a[v]}' access.log
Copy the code

Count the number of times the access IP is 404:

# awk '{if($9~/404/)a[$1" "$9]++}END{for(i in a)print v,a[v]}' access.log
Copy the code

conclusion

This article describes the basic concepts, usage, and two common application scenarios of AWK. It is hoped that you can quickly locate problems in log analysis after reading this article

Due to time constraints, I did not list too many application scenarios. I will update this article if there is time or I use a particularly representative scenario in my work


I am Wu Liu, a programmer who loves basketball and coding

Follow me and bring you more of what you want to see ❤