Let’s start using it.
Awk is a powerful text parsing tool for Unix and UniX-like systems, but it is also considered a programming language because it has programmable functions that you can use for routine parsing tasks. You may not use AWK to develop your next GUI application, and it may not replace your default scripting language, but it is a powerful program for specific tasks.
These tasks can be surprisingly diverse. The best way to understand what problems AWK can solve is to learn AWK. You’ll be amazed at how AWK can help you get more done with less effort.
The basic syntax of AWK is:
awk [options] 'pattern {action}' file
Copy the code
First, create this sample file and save it as colors.txt.
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
Copy the code
Data is separated into columns by one or more Spaces. It is common to organize the data to be analyzed in some way. It doesn’t always have to be a space-separated column, and it may not even be a comma or semicolon, but it usually has a predictable format, especially in log files or data dumps. You can use data formats to help AWK extract and process the data you care about.
Print columns
In awk, the print function displays what you specify. There are many predefined variables that you can use, but the most common are columns named as integers in text files. Give it a try:
$ awk '{print $2; } ' colours.txt
color
red
yellow
red
purple
green
purple
brown
brown
yellow
Copy the code
Here, AWK displays the second column, represented by $2. This is relatively straightforward, so you might guess that print $1 shows the first column, print $3 shows the third column, and so on.
To display all columns, use $0.
The number after the dollar sign ($) is an expression, so $2 and $(1+1) mean the same thing.
Select columns conditionally
The sample files you use are very structured. It has one row that acts as a header, and the columns are directly related to each other. By defining conditions, you can limit what AWK returns when it finds this data. For example, to view the item that matches yellow in the second column and print the contents of the first column:
awk '$2=="yellow"{print $1}' file1.txt
banana
pineapple
Copy the code
Regular expressions also work. This expression approximately matches the values in $2 that begin with p followed by any number of characters (one or more) followed by p:
$ awk '$2 ~ /p.+p/ {print $0}' colours.txt
grape purple 10
plum purple 2
Copy the code
Numbers can be interpreted naturally by AWK. For example, to print a row with an integer greater than 5 in the third column:
awk '$3> 5 {print $1, $2}' colours.txt
name color
banana yellow
grape purple
apple green
potato brown
Copy the code
Field separator
By default, AWK uses Spaces as field separators. However, not all text files use Spaces to define fields. For example, create a file called colours.csv with the following:
name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5
Copy the code
Awk handles data in exactly the same way as long as you specify which character to use as a field separator in a command. Use the –field-separator (or simply -f) option to define the separator:
$ awk -F"," '$2=="yellow" {print $1}' file1.csv
banana
pineapple
Copy the code
Save the output
With output redirection, you can write the results to a file. Such as:
$ awk -F, '$3>5 {print $1, $2} colours.csv > output.txt
Copy the code
This will create a file containing the contents of the AWK query.
You can also split files into multiple files grouped by column data. For example, if you want to split colours.txt into multiple files based on the color displayed on each line, you can include redirection statements in awk to redirect each query:
$ awk '{print > $2".txt"}' colours.txt
Copy the code
This will generate files with names such as yellow.txt and red.txt.
In the next article, you’ll learn more about fields, records, and some of the powerful AWK variables.
This post is adapted from Hacker Public Radio, a community technology podcast.
Via: opensource.com/article/19/…
By Seth Kenlon (lujun9972
This article is originally compiled by LCTT and released in Linux China