Write in front:
Translators have only used grep before, and only recently learned that sed and AWk are more advanced tools. So I googled and found the article: Difference Between grep, sed, and awk. I felt it was quite good and easy to understand. So try to translate it, save it for later search and browsing, try to translate the article for the first time, pat.
Body:
-
In this article, we introduce the command line tools grep, sed, and awk. In particular, we will study the functional differences between them.
-
Background There are three handy tools for text processing in Linux: grep, sed, and awk. Although they are completely different tools, their functions seem to overlap in simple cases. For example, to find fixed-format text in a file and print a match to standard output, we find that they can do both. However, if you go beyond simple exercises, grep is only good for simple text matching and printing. On the other hand, sed provides other text conversion commands, such as replace, in addition to matching and printing text. Finally, awk is the most powerful of these tools, a scripting language that offers a lot of functionality that the first two tools don’t. Before we begin, it’s important to know that the purpose of this article is to make the distinction between these three tools clearer. Therefore, the examples we’ve covered are only a small sample of each tool, especially in the case of SED and AWK.
-
For the sake of discussion, we define a text file log.txt:
Timestamp Category Message 1598843202 INFO Booting up system 1598843402 INFO Booting up critical service: Authorization 1598843502 INFO System booted successfully 1598853502 INFO User admin requested access for userlist 1598863888 ERROR User annonymous attempt to access protected resource without credentials 1598863891 INFO System health check status: passed 1598863901 ERROR Requested resource not found 1598864411 INFO User admin logged out Copy the code
-
The grep grep command searches for lines that match the regular expression pattern and prints those matching lines to standard output. This is useful when we need a quick way to find out if a particular pattern exists in a given input. 4.1 Basic Syntax
grep [OPTIONS] PATTERN [FILE...] Copy the code
PATTERN is a regular expression PATTERN that defines what we want to find in the contents of the FILE specified by the FILE argument. OPTIONS The optional parameter is a flag used to modify the grep behavior.
**4.2 Searching for lines that match regular expression patterns ** Suppose we want to extract ERROR events from log.txt. We can do this using grep:
$ grep "ERROR" log.txt 1598863888 ERROR User annonymous attempt to access protected resource without credentials 1598863901 ERROR Requested resource not found Copy the code
The idea is that grep scans the lines in log.txt and prints the lines containing ERROR to standard output. We can use the -v flag to reverse the match:
grep -v "INFO" log.txt Copy the code
When we run the above command, grep prints every line in log.txt that doesn’t match INFO. ** 4.4 Print before or after ** Sometimes we may want to print the previous or the next line around the match. To print the last five lines of matching text, we can use the flag -a (after) :
grep -A 5 ERROR log.txt Copy the code
On the other hand, to print the first five lines of matching text, we can use the flag -b (before) :
grep -B 5 ERROR log.txt Copy the code
Finally, the -c flag allows us to print five lines before the match and five lines after the match:
grep -C 5 ERROR log.txt Copy the code
-v: reverse A: print back -b: print forward -I: ignore case. -e: = egrep Enable extended mode.
-
The **sed command is a stream editor for character streams. It is more powerful than grep because it offers more options for text processing, including sed’s best-known alternative command.
5.1 Basic Syntax ** The sed command has the following general syntax:
sed [OPTIONS] SCRIPT FILE... Copy the code
OPTIONS is an optional flag that can be applied to SED to modify behavior. Next, the SCRIPT argument is the sed SCRIPT, which is executed on each line of the FILE specified by the FILE argument.
5.2 Script Structure THE SED script has the following structure:
[addr]X[options] Copy the code
Where addr is the condition applied to the lines of the text file. It can be a fixed number or a regular expression that tests the contents of a row before processing it. Next, the X character represents the sed command to execute. For example, a substitution command represented by a single character. Finally, options can be passed to the sed command to specify its behavior.
First, let’s see how to use sed to replicate grep:
sed -n '/ERROR/ p' log.txt Copy the code
By default, sed prints each line being scanned to the standard output stream. To disable this automatic printing, we can use the -n flag. Next, it runs the script after the -n flag and looks for the regular expression ‘ERROR’ in each line of log.txt. If there is a match, sed prints the line to standard output because we used the p command in our script. Finally, we pass log.txt as the name of the file we want sed to process, as a final argument.
5.4 Sed The following commands are used to replace the sed function:
's/pattern/replacement/' Copy the code
When a match exists on the pattern line, sed replaces it with replacement. For example, if we wanted to replace the ERROR word in log.txt with the CRITICAL word, we could run:
sed 's/ERROR/CRITICAL/' log.txt Copy the code
If we want sed to keep the changes on the file we are working on, we can use the -i flag and suffix. Before making changes, sed creates a backup of the file and appends the suffix to the backup file name. For example, when we run:
sed -ibackup 's/ERROR/CRITICAL/' log.txt Copy the code
Before sed applies the changes, log.txt is copied and renamed log.txtBackup. 支那
5.6 Limiting specific line numbers ** We can restrict the sed command so that it can only run on specific line numbers using addr from the script:
sed '3 s/ERROR/CRITICAL/' log.txt Copy the code
This will only run the script on line 3 of log.txt. In addition, we can specify the line number range:
Sed '3, 5 s/ERROR/CRITICAL/log. TXTCopy the code
In this case, sed will run the script on lines 3 through 5 of log.txt. Alternatively, we can specify boundaries using regular expression patterns:
sed -n '3,/ERROR/ p' log.txt Copy the code
Here sed prints the log.txt line starting at line 3 and ends when it finds the first line that matches the pattern/ERROR /.
-
awk
Awk is a fully fledged programming language comparable to Perl. Not only does it provide a large number of built-in functions for string, arithmetic, and time manipulation, but it also allows users to define their own functions as they would any regular scripting language. Let’s look at some examples of it.
6.1 Basic Syntaxawk [options] script file Copy the code
It will execute the script for each line in the file. The script structure is expanded as follows:
'(pattern){action}' Copy the code
This mode is regular expression mode and is tested for each line of input. If the pattern matches, awK executes the script defined in the operation on that line. If no schema condition exists, the action is performed on each row. 支那
**** Let’s see how awk emulates grep just as we did with sed:
awk '/ERROR/{print $0}' log.txt Copy the code
The code above will find the regular expression pattern ERROR in the log.txt file and print the matching lines to standard output. 支那
Similarly, we can use awK’s built-in method gsub to replace all errors with CRITICAL, as in the sed example:
awk '{gsub(/ERROR/, "CRITICAL")}{print}' log.txt Copy the code
The gSUB method takes a regular expression pattern and a replacement string as parameters. Awk then prints the line to standard output. 支那
In AWK, a BEGIN block is executed before any line of the file begins to be processed. On the other hand, there is an END block that allows us to define what should run after all the rows have been processed. Let’s add the header and footer to the text document using BEGIN and END blocks:
$ awk 'BEGIN {print "LOG SUMMARY\n--------------"} {print} END {print "--------------\nEND OF LOG SUMMARY"}' log.txt LOG SUMMARY -------------- Timestamp Category Message 1598843202 INFO Booting up system 1598843402 INFO Booting up critical service: Authorization 1598843502 INFO System booted successfully 1598853502 INFO User admin requested access for userlist 1598863888 ERROR User annonymous attempt to access protected resource without credentials 1598863891 INFO System health check status: passed 1598863901 ERROR Requested resource not found 1598864411 INFO User admin logged out -------------- END OF LOG SUMMARYCopy the code
** 6.5 Column operations ** Processing documents with row and column structures (CSV-style) is awK’s real highlight. For example, we could easily print the first and second columns and skip the third column in our log.txt:
awk '{print $1, $2}' log.txt Copy the code
** 6.6 Customizing Delimiters ** By default, AWK uses Spaces as delimiters. If the delimiter used for processing text is not a space (for example, a comma), it can be specified using the -f flag:
awk -F "," '{print $1, $2}' log.txt Copy the code
** 6.7 Arithmetic operations ** AWK’s ability to perform arithmetic operations makes it easy to gather some numeric information about text files. For example, let’s count the number of ERROR events in log.txt:
awk '{count[$2]++} END {print count["ERROR"]}' log.txt Copy the code
Awk stores the number of occurrences of different values in the Category column in the variable count. The script then prints the number of errors at the end. ** as a mature scripting language, AWK makes it easy to understand decimal values. This makes text processing easier when we need scripts to interpret values as numbers rather than simple strings. For example, if we want to make all log entries prior to timestamp 1598863888, we can use greater than sign > :
$ awk '{ if ($1 > 1598863888 ) {print $0} }' log.txt 1598863891 INFO System health check status: passed 1598863901 ERROR Requested resource not found 1598864411 INFO User admin logged ou Copy the code
From the output, we can see that this command prints only the log lines recorded after the specified timestamp.
-
In this article, we start with a basic introduction to grep, sed, and AWk. Then, we showed the use of grep in simple text scanning and matching. Next, we saw that sed is more useful than grep for converting text. Finally, we showed how AWK can replicate grep and sed functionality, while also providing more functionality for advanced text processing.
Write in the back
In fact, it is easy to see that grep is equivalent to a tool packaged by the system, with simple functions and easy to use. Sed and AWK, on the other hand, are relatively low-level apis, with high flexibility but a certain cost to get started. Grep -> sed -> AWk flexibility increases gradually, depending on who meets your needs.
So, can the translator provide more practical examples to illustrate? Unfortunately, not, at least not yet 😂. But that’s the thing about tools. You don’t know when it’s ready until you’ve seen it and understood what it’s basically for, and then come across a situation where you can use it, right?