Linux text processing three musketeers

Grep :(global search regular expression and print out the line) text filtering tool
Sed :(stream editor) a text editor
Awk :(initials of three inventors) implementation of Linux as gawk, text report generator (formatted text)

Regular expression formula:

A pattern written by the special character **== and the text character ==**, some of which do not represent their literal meaning but are used to represent control or wildcard functions

Metacharacters fall into two categories:

Basic regular expression BRE

Extend the regular expression ERE

A grep.

Function: text filtering tool, according to the user == specified “mode (filter condition)” == on the target text == line by line == match check, print the matched line;
Pattern: Filter conditions written by metacharacters and text characters of regular expressions
```
grep [OPTIONS] PATTERN [FILE...]  grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]Copy the code
```

1. option


`-i`	ignore
`-o`	only
`-v`	invert
`-q`	quiet
`-E`	Extend	Extend regular
`-a`	all	–text Don’t ignore binary data

`-A`	After
`-B`	Before
`-C`	Context
`-- color = auto`

2. Regular expression /PATTERN

Basic regular expression metacharacters

2.1 Character Matching


`.`	Any single character
`[]`	Specifies any single character within the range ==
`(^)`	Specifies any single character outside the range ==
`[:digit:]`	digital
`[:lower:]`	lowercase
`[:upper:]`	A capital
`[:alpha:]`	The letter
`[:alnum:]`	Containing numbers and letters
`[:punct:]`	punctuation
`[:space:]`	Whitespace characters (including tabs, Spaces, newlines, etc.)

[root@localhost ~]# grep "r[[:alpha:]][[:alpha:]]t" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
Copy the code

Note:

[[:alpha:]]

== Parentheses == match a single character

== Parentheses == indicate a single letter character

2.2 Number of matches

Limit the number of occurrences of the preceding character == = (default greedy mode, as many matches as possible) == Backslashes are escaped ==

	Matches the preceding character
`*`	== any time == =;0,1, multiple times
`. *`	== Any == length == any == character
`\?`	0 or 1 times, == Optional ==
`\ +`	1 or more times, == At least once ==
`\{m\}`	== Specify exactly m times ==
`\ \} {m, n`	== = at least m times, == at most n times
	`\ \} {0, n`Up to n times
	`\ {m, \}`At least m

2.3 Position == Anchor (beginning of line, end of line, beginning of word) ==

Line,
`^`	The beginning of a line anchor
`$`	Anchor end of each line
`^PATTERN$`	Match the entire line
`^[[:space:]]*$`	A line containing whitespace characters
`\ <`	The word first anchor
`\ >`	Ending anchor
`\<PATTERN\>`	Accurate anchor

Exercises:

1. grep -v  "/bin/bash$" /etc/passwd
#Note that you need to anchor the beginning and end of the word, otherwise it's not two or three digits.2. Grep "\" [[: digit:]] \ {2, 3 \} \ > "/ etc/passwd and grep" ^ [[: space:]] \ {1, \} [^ [: space:]] \ {1, \} "/ etc/grub2. CFG load_env set default="${next_entry}" set next_entry= save_env next_entry set boot_once=true set default="${saved_entry}" 4 menuentry_id_option = "id" netstat ant | grep "LISTEN [[: space:]] * $" TCP 0 0 0.0.0.0:139 0.0.0.0: * LISTEN TCP 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:22 0.0.0.0:* LISTEN TCP 0 0 0.0.0.0:445 0.0.0.0:* LISTEN tcp6 0 0 :::3306 :::* LISTEN tcp6 0 0 :::139 :::* LISTEN tcp6 0 0 :::22 :::* LISTEN tcp6 0 0 :::445 :::* LISTEN tcp6 0 0 :::33060  :::* LISTENCopy the code

2.4 Grouping and Referencing (== Backreferencing == : Referencing the characters matched by the preceding parentheses)


`\ (\)`	To bind one or more characters together, == is treated as a whole ==
The grep engine automatically logs the grouping variables
`\ 1`	The first set of open parentheses, the content matched by the close parentheses
`\ 2`
`3 \`

3. egrep fgrep

Egrep: extends grep

Fgrep: fastgrep The default is not regular

All three can be switched by -e -F -G

metacharacters

== Note: parentheses () and <> need to be escaped ==

practice

1. grep -i "^s" /proc/meminfo grep -i "^[sS]" /proc/meminfo grep -E "^s|^S" /proc/meminfo grep -E "^(s|S)" /proc/meminfo  2.	#Note the anchoring endingsgrep -E "^(root|mysql|samba)\>" /etc/passwd 3. grep -E "\<[[:alpha:]]*\>\(" /etc/rc.d/init.d/functions grep "\" [[: alpha:]] * \ > ("/etc/rc. D/init. After d/functions provides correct: Grep -e "\" [_, [: alnum:]] * \ > \ (\) "/ etc/rc. D/init. 4 d/functions provides the echo ` PWD ` | egrep -o" ^ / / / < [[: alpha:]] + \ > "after the correct: Note from the end of each line began to locate the echo/root/mysite/layouts / | grep - E - o "[^ /] + /? The $5. "ifconfig | grep - E '\" {1} [1, 2] [0-9] [0-9] {0, 1} {0, 1} \ >' corrected: From the ones digit, Ten to hundred digit start 1-9 of 10-99 100-199, 200-249, 250-255 the ifconfig | grep - E '\ < (1-9] [| [1-9] [0-9] | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] 25 [0-5]) | \ >' 6. 1-255 0-255 0-255 1-254 ifconfig |grep -E -o | < (' \ [1-9] [1-9] [0-9] | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] 25 [0-5]) | \ >. \ "([0-9] | [1-9] [0-9] | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] 25 [0-5]) | \ >. \" ([0-9] | [1-9] [0-9] [0-9] | 1 | 2 [0-9] [0-9] [0-4] 25 [0-5]) | \ > \ < (1-9] [| [1-9] [0-9] | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] 25 [0-4]) | \ > '7. Pay attention to the first anchor words ending, the beginning of a line, end-of-line grep -e "^ (\" + \ [^ :] >). * \ 1 $"/etc/passwdCopy the code

* Added WC, cut, sort, UNIq, diff

1. wc: world count


`-l`	line
`-w`	word
`-c`	byte

2. cut


`-d`	The separator
`-f`	How many columns

3. sort

The sorting algorithm is awesome


`-t` char	Specifying delimiters
`-k#`	Fields that sort comparisons
`-n`	Compare in numerical order
`-r`	The reverse
`-f`	Ignore character case
`-u`	== Only one copy of == is kept for duplicate rows. Repeated lines: continuous and identical

4. uniq


`-c`	Displays the number of repeats per line
`-u`	Only unique rows are displayed
`-d`	Only non-unique rows are displayed

5. diff

Compare the differences line by line

6. Patch (forward and reverse)

diff old file Newfie > patch_file

-u uses an unfied mechanism that shows the context of the row to be modified. Default is three lines.

-r Indicates that the recovery mode can be forward or reverse

practice

Obtain the IP address in the ifconfig command result

2. Sed

sed  [OPTION]... {script} [input-file]...
Copy the code

1. option


`-n`	`quiet`	Does not output the contents of the mode space to the screen (Each line will enter the mode space, equal to input all not output, output is script editing command operation results)
`-e`		Multi script operation input text at the same time, multi – point editing `-e script -e script`
`-f`	`file`	Script Script file, one edit command per line
`-r`	`regular`	Support for RegEXP -extended regular expressions
`-i[SUFFIX]`	`edit files in place (makes backup if SUFFIX supplied)`	Edit the original file directly and add the suffix -i`File + suffix`Backup files of

-n before and after comparison, filter out the matched lines in script delimiting

[root@localhost layouts]# cat -n /etc/fstab 
     1
     2  #
     3  # /etc/fstab
     4  # Created by anaconda on Sun Nov  8 17:58:16 2020
     5  #
     6  # Accessible filesystems, by reference, are maintained under '/dev/disk'
     7  # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
     8  #
     9  UUID=09780b2e-66e0-4fc3-8d91-7d9ddd350bbb /                       ext4    defaults        1 1
    10  UUID=79e8f2c8-9f29-4248-9181-39209540d9a8 /boot                   xfs     defaults        0 0
    11  UUID=b822f2a7-37ba-4df5-9a1e-f82f9e2f9bfe swap                    swap    defaults        0 0
    
[root@localhost layouts]# sed -n '1~2a\123' /etc/fstab 
123
123
123
123
123
123
[root@localhost layouts]# sed '1~2a\123' /etc/fstab 

123
#
# /etc/fstab
123
#Created by anaconda on Sun Nov 8 17:58:16 2020
#
123
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
123
#UUID=09780b2e-66e0-4fc3-8d91-7d9ddd350bbb / ext4 defaults 1 1123 UUID=79e8f2c8-9f29-4248-9181-39209540d9a8 /boot xfs defaults 0 0 UUID=b822f2a7-37ba-4df5-9a1e-f82f9e2f9bfe swap swap  defaults 0 0 123Copy the code

2. The script

== Address delimiter editing command == (note that there is no split in the middle)

Address and bound

Empty address: Full text is processed
Single address:

# Specify the line

/pattern/ Each row that is matched by this pattern


`#`	Specify the line
`/pattern/`	Each row that is matched by this pattern

Address range


`# #,`	[Specify line, specify line]
`+ # #`	[specify line, specify line +#]
`#,/pattern/`	[Specify line, pattern matches line]
`/pattern1/,/pattern2/`	[Pattern1 matches the row, pattern2 matches the row]

Step by step

1 ~ 2 Start at 1, step size 2, all odd rows

2 ~ 2 Starting at 2, step size 2, all even rows


`1 ~ 2`	Start at 1, step size 2, all odd rows
`2 ~ 2`	Starting at 2, step size 2, all even rows

Edit command


`d`	delete	Delete matching rows
`p`	pattern	Displays the contents of the schema space
`a \text`	append	Append after matching rows
`i \text`	insert	Insert before matching row
`c \text`	relace&change	The matched lines are replaced with text
`w /WritePath`	write	Saves the rows matched in the pattern space
`r /FilePath`	replace&append	Read the file and append it to the pattern space matching line
`=`	The line Numbers	Prints the line numbers for the lines that the pattern matches
`!`	Address and bound`!`Edit command	sed -n ‘2~2! =’ testtest11 1 3
`S /// Replace the tag`	search&replace	The == delimiter can specify ==s@@@``s### == Replace flag == :`g`Global replacement`w`The result of successful replacement is saved`p`Displays the lines that were printed successfully

3. input-file

You can add multiple files and process them.

practice

1. sed -r '/^[[:space:]]+/s/^[[:space:]]+//p' /etc/grub2.cfg 2. sed -r -n '/^#/s/^#[[:space:]]*//p' /etc/fstab 3. echo '/home/xcg/desktop' |sed 's@[^/]*\/\? $@ @ 'Copy the code

4. Advanced editing commands

== mode space, hold space ==

Imaginative:

Three awk.

awk (1)              - pattern scanning and processing language
Copy the code

Awk 197 has been around for a few years, GNU AWk is reimplemented on Linux.

Pattern scanning and processing == language ==, scripting language interpreter programming language.

1. Basic usage

awk [options] 'program' file ...
Copy the code

Program supports conditional judgments, loops, and variables.

The program:

==PATTERN{ACTION STATEMENTS}==

Statements are separated by semicolons

2. option


`-F`	The separator
`-v`	Define built-in variables	-v FS=’:’ (field seperator defaults to blank characters) -v OFS=’:’ (output field seperator defaults to blank characters)

3. program

3.1 the print

3.2 variable

3.2.1 Built – in variables


`FS`	field seperator	The default is whitespace
`OFS`	output field seperator	The default is whitespace
`RS`	Input Record Separator	A line break is entered, which is delimited by the separator
`ORS`	output Record seperator	Newline character for output
`NF`	number of fields	Number of fields
	{print `$NF`} The last word is short
	{print `NF`} number of fields
`NR`	number of record	The number of rows
`FNR`	File number of Record	Each file counts separately, the number of lines
`FILENAME`		Current file name
`ARGC`		Number of command line arguments
`ARGV`		Array that holds each parameter given on the command line

3.2.2 User-defined Variables

-v var = value

Variables are case-sensitive

inprogramDirectly defined in

3.3 the printf command

Printf “FORMAT, item1, item2…”

FORMAT must be given
Does not wrap, requires an explicit line break control character, \n

In FORMAT, you need to specify a formatting symbol for each subsequent item

Format character


`%c`	Display the ASCII code of the character
`%d`.`%i`	Decimal integer
`%e`.`%E`	Scientific counting numerical display
`%f`	Displays as a floating point number
`%g` `%G`	Display values in scientific notation or dot form
`%s`	Display string
`%u`	Unsigned integer
`% %`	Display % itself

The modifier


`# # [.]`	The first one`#`Digitally controlled displayThe width of theAnd the second`#`After the decimal pointprecision	% 3.1 f
`-`	== left aligned ==	%-15s
+	A symbol that displays numerical values

3.4 the operator

Arithmetic operator
```
+ - * / ^ % 
-x
+x
Copy the code
```
String operators: unsigned operations, string concatenation
Assignment operator:
```
= += -= *= /= ^= %=
++ --
Copy the code
```
Comparison operator
```
> >= < <=! = = =Copy the code
```
== Mode card ==
```
~ match! ~ does not matchCopy the code
```
Logical operator
```
&&
||
!
Copy the code
```

A function call

Function_name (argu1 argu2,...)Copy the code

Conditional expression
```
? : :Copy the code
```
A user whose uid is greater than or equal to 1000 is a common user; otherwise, a system user

3.5 the PATTERN

== is similar to the delimiter == of sed


`empty`	Empty mode, matching each line
`/regular expression/`	Only rows == that can be matched by this pattern == are processed
`relation expression`	The == relation/comparison expression == is processed only if the result is true.True: The result is a non-zero value.
`line ranges`	Line range
	Startline, endline:`/ pattern1 /, / pattern2 /`Numeric format is not supported
`BEGIN/END mode`	`BEGIN{}`Execute only once before you start processing the text in the file
	`END{}`Execute only once after text processing is complete

3.6 Common Actions


`expression`
`control statement`	Control statements such as if while
`compound statements`	Combined statement
`input statements`	The input statement
`output statements`	Output statements

3.7 Control Statements

if(condition) {statements}

if(condition) {statements} else {statements}

while(condition){statements}

do{statements} while (condition)

for(expr1; expr2; expr3){statements}break;

continue;

delete array[index]

delete array

exit

{ statements }
Copy the code

If – else

Syntax: if(condition) {statements} else {statements}
while

Grammar: while (condition) {statements}
Do while

Syntax: do{statements} while (condition)

A loop body that executes at least once
The for loop

Grammar: the for (expr1; expr2; expr3){statements}
A switch statement

Syntax: switch(expression){case VALUE1 or /REGEXP:statement; case VALUE2 or /REGEXP2: statements; . }
Break and continue
next

End processing of this line early and advance to the next line. Similar to continue, but next is between lines and continue is within lines.
Array == Common statistics ==

Associative array: array[index-expression]

index-expression:
- Any string can be used. Strings need to be enclosed in == double quotes ==.
- If an array element does not already exist, awK automatically creates the element and initializes its value as an “empty string” when referencing it
  
  == Statistics are collected by IP address access count ==
  == Practice: ==
  1. Count the number of occurrences of each file system type in the /etc/fstab file.
```
~] # awk '/^UUID/{fs[$3]++}END{for(i in fs) {print i,fs[i]}}' /etc/fstab 
swap 1
ext4 1
xfs 1
Copy the code
```
  2. Counts the number of occurrences of each word in a specified file
```
~] # cat word.txt 
aaa bbb aaa ccc aaa eee bbb ccc
~] # awk '{for(i=1; i<=NF; i++) word[$i]++}END{for(i in word) {print i,word[i]}}' word.txt 
aaa 3
ccc 2
eee 1
bbb 2
Copy the code
```

3.8 the function

Built-in function
Numerical processing	`rand()`	Returns a random number between 0 and 1, only the first fetch is random
String handling	`sub(r,s,[t])`	To find the`t`Represents a match in a string`r`The content and put itFor the first time,Appear instead of`s`Content of presentation
	`gsub(r,s,[t])`	Global replacementTo find`t`Represents a match in a string`r`The content and put itAll The TimesAppear instead of`s`Content of presentation
	`split(s,a[],r)`	In order to`r`Slits characters for delimiters`s`And save the cutting result to the array represented by A

Split usage examples:
#Delimit the third column of characters with:, and then format the output
~] # netstat -nlptu |awk '/^tcp\>/{split($4,ip,":"); count[ip[1]]++}END{for(i in count) {printf"%-10s\t%d\n",i,count[i]}}'
127.0.0.1       1
0.0.0.0         3
Copy the code

Linux text processing three musketeers

A grep.

1. option

2. Regular expression /PATTERN

2.1 Character Matching

2.2 Number of matches

2.3 Position == Anchor (beginning of line, end of line, beginning of word) ==

2.4 Grouping and Referencing (== Backreferencing == : Referencing the characters matched by the preceding parentheses)

3. egrep fgrep

* Added WC, cut, sort, UNIq, diff

1. wc: world count

2. cut

3. sort

4. uniq

5. diff

6. Patch (forward and reverse)

2. Sed

1. option

2. The script

3. input-file

practice

4. Advanced editing commands

Three awk.

1. Basic usage

2. option

3. program

3.1 the print

3.2 variable

3.2.1 Built – in variables

3.2.2 User-defined Variables

3.3 the printf command

3.4 the operator

3.5 the PATTERN

3.6 Common Actions

3.7 Control Statements

3.8 the function

Related Posts

Container network model: CNI vs CNM

Those things in Go concurrent programming

Software Architecture – Zuul Microservices Gateway (Part 1)