This is the 12th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

awk

AWK is a language for processing text files. It treats the file as a sequence of records.

In general, each line of a file’s content is a record. Each line is divided into a series of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on.

AWK programs are made up of blocks of statements that deal with specific patterns. AWK can read one input line at a time. For each input line, the AWK interpreter determines if it conforms to any pattern that appears in the program and performs the corresponding action for that pattern.

— Alfred Ahoe, The A-Z of Programming Languages: AWK

The general syntax of the awk command

awk '{commands}'
Copy the code
  • commandsIs one or more commands
  • Common options:
    • -f, indicating that the awk command will read the instruction from the file after the flag rather than from the command line;
    • -FcThis flag indicates that the delimiter between fields is C rather than the default whitespace character (such as TAB keys, one or more whitespace characters).

Common points:

  • print: Prints out all the data lines in the file, line by line.

Print str1 str2 int1,int2…

Also can print (A, B, C,…).

$ who | awk '{ print }'   # This is equivalent to using "who" directly.
Copy the code

To print the \n, \t characters:

$ ls -lF /boot | awk '{ print $5 "\t" $9}' | sort -rn | head -3
Copy the code

This command lists the name and size of every file in the /boot directory, with the size of the file first and the name of the file next, separated by a TAB key, and output the three largest files in reverse order of size. Uses the $n usage below, more on that later)

Awk can also use the C-style printf() :

printf("Total: %s\n",totalsize)    # note '\ n'
Copy the code

$n: Field variables

In the result display of files and Linux commands, each line of information is divided into several fields by specified delimiters, and each field is assigned a unique identifier.

For example, the identifier for field 1 is $1, the identifier for field 2 is $2, and so on. Specifically, there is a variable $0 that represents the entire line ($0=re(‘^\*$’)).

Awk makes a lot of use of such field identifiers. Here are some examples:

  1. The first field in each line of the who command is displayed, that is, the name of the user who has logged in to the Linux system

    $ who | awk '{ print $1 }'
    Copy the code
  2. Add some explanatory words to the previous command:

    $ who | awk '{ print "User  " $1 " is on terminal line " $2}'
    Copy the code
  3. The second field in the emp.data file is the employee’s last name and the fourth field is the salary. Add Employee to the last name, add has salary string between the last name and the salary, and output:

    $ awk '{ print "Employee  " $2 " has salary " $4}' emp.data
    Copy the code
  4. All fields in /etc/passwd are separated by:. Now to get the shell that some users (for example, foo and bar) use to log in, use the following steps:

    • 1). You can consider using grep from/etc/passwdFile extraction contains target usersfoobarRows of data;
    • 2). Use the awk command to view colons as field separators and list field 1 (user name) and field 7 (shell at login);
    • 3) To facilitate reading, add some description information in the displayed results to help reading and understanding;

    The specific implementation is as follows:

    $ egrep 'foo|bar' /etc/passwd | awk -F: '{ print $1" has " $7 " as loggin shell." }'
    Copy the code
  5. Select * from ‘shell’ where ‘shell’ is used by default:

    $ awk -F: '{ print $7 }' /etc/passwd | sort | uniq -c
    Copy the code
  6. Get the name of the shell that users use when logging in to the /bin directory; For the sync user who logs in to /bin/sync, it is not allowed to appear in the displayed results. And then you sort it.

    $ grep /bin/ /etc/passwd | awk -F: '{ print $1" " $7 }' | sed '/sync/d' | sort
    Copy the code
  7. Show the computer on which the user logged in to the Linux system at the front of each user record (the last field in the result of the WHO command is the IP address of the computer on which the user logged in to the Linux system, if empty indicates local login).

    $ who | awk '{ print $6": "$0}'
    Copy the code

NF,NRvariable

NF$NF

  • NF (NF variable without $sign) : indicates how many fields are in a row of records.

  • $NF (NF variable with $sign) : the last field in a row (i.e., the NF field, which is the last field)

e.g.

  1. List the number of fields (columns) in each row of the result displayed by the WHO command:

    $ who | awk '{ print NF }'
    Copy the code
  2. List the last field of each line in the result displayed by the WHO command:

    $ who | awk '{ print $NF }'
    Copy the code
  3. A complex example:

    $ egrep 'bin|sbin' /etc/passwd | awk -F: '{ print $NF }' | sort | uniq -c | sort -n
    Copy the code
    • egrep: Extract the rows containing bin or sbin from the /etc/passwd file
    • awk: Lists the last field in each line, using the colon as the field separator
    • sort: Sorts the fields
    • uniq -c: Merges the same row and prefixes each row with the number of times that row occurred
    • sort -n: Sorted by the number of times.

NR

The variable NR is used to keep track of the number of rows displayed, that is, the number of rows displayed.

Such as:

$ls - l ~ / Wolf | awk '{print NR ":" $0}' # with NR to each row a number 1:2: total 16 DRWXRWXR - x 2 dog dog 4096 Jan 25 2009 boywolf 3: -rw-rw-r-- 1 dog dog 84 Dec 22 19:07 delete_disable\ ......Copy the code

In awk calculation

Awk can use C operators, if conditions, and for loops. But you don’t have to declare variables before you use them.

e.g.

Get the totalsize of all files in the /boot directory (this will give the value of totalsize for each step in the process) :

$ ls -lF /boot | awk '{ totalsize += $5; print totalsize }'
Copy the code

We only need the final result, no intermediate value, we can use tail to take the last one:

$ ls -lF /boot | awk '{ totalsize += $5; print totalsize }' | tail -1
Copy the code

But instead of using the tail command, a better approach is to use the END keyword in the awk command.

ENDThe keyword

After executing END in the last step {statements} :

$ ls -lF /boot | awk '{ totalsize += $5} END { print totalsize }'
Copy the code

If condition, for loop

Examples of using if:

$ awk  '{ if (length($4) == 3 ) print $0 }' emp.data | wc -l
Copy the code

So let’s use a for loop and just like if, just write for(I =0; i < 10; I++) will do.

willcommandsPut an expression in a file

As you’ve seen before, Awk commands can be very complex to write, basically a program, so we can put it in a file:

Write the commands in your favorite editor and save them in a file such as Script1. You can then read commands from a file using -f of awk:

$ ls -lF /boot | awk -f script1
Copy the code