One, foreword

These days when I need to query a large log file, every time I open vim, cat and so on, I get stuck, but I need to see how many rows of data match the condition. Here are some common query commands.

Common search commands

1. Grep search

Grep parameter file name | head / / from the beginning to find grep parameter file name | wc - l / / see how many qualified line cat filename | grep parameters $/ / output to the end of line

2, the instance,

(1) Search the number of rows according to specific parameters

cat /data/weblogs/xxx.access.log  |grep "GET /pixel.jpg?"|wc -l 
			4102386

(2) Partial regular query

cat /data/weblogs/ |grep "25/Nov/2019:15:[00-59]" |wc -l 
Copy the code

Select * from 25/Nov/2019:15 from 00 to 59

(3) Pipe connection can be used between multiple conditions to query the number of lines that meet both conditions

cat /data/weblogs/xxx.log |grep "25/Nov/2019:15:[00-59]" |grep "GET /pixel.jpg?"|wc -l 

		120

Query the number of rows that match condition 1 or condition 2

cat /data/weblogs/xxx.log |grep -E "25/Nov/2019:15:[00-59] |GET /pixel.jpg?"|wc -l4098135 Short: grep -e"exp1|exp2|exp3" | wc -l

3. Grep is a fuzzy query

When using grep to search for the port number, the result is not satisfactory, as shown in the following example:

netstat -anp |grep -i '80'(Not all processes could be identified, non-owned process info will not be shown, You would have to be root to see it all.) TCP 0 0* LISTEN - TCP 0 0* LISTEN - TCP 00* LISTEN - TCP 00 TIME_WAIT -Copy the code

To query the usage of port 80, run the following command:

 netstat -apn | awk '{split($4,arr,":"); if(arr[2] == "80") print $0}'Copy the code

One step in place, found to be 80 port process, very easy to use.

Search for the IP address in the file

1. Match the IP address

grep -Eo '([^0-9]|\b)((1[0-9]{2}|2[0-4][0-9]|25[0-5]|[1-9][0-9]|[0-9])\.) {3}(1[0-9][0-9]|2[0-4][0-9]|25[0-5]|[1-9][0-9]|[0-9])([^0-9]|\b)' xxx.log | sed -nr 's/([^ 0-9] | \ b) (([0-9] {1, 3} \.) {3} [0-9] {1, 3}) ([^ 0-9] | \ b) / 2 / p \ '|wc -l

31116275

2. Query the number of occurrences of each IP address

grep -E -o "(25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) "XXX. The log | sort | uniq -c 2 4 8

The first is the number of occurrences, followed by the IP

3. More accurate IP matching

grep -E -o "(25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \. (25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) "  xxx.log|wc -l

32929372

4. Fuzzy IP matching

grep -E -o "([0-9] {1, 3} [\.]) {3} [0-9] {1, 3}" xxx.log|wc -l

32930309

5, multiple conditions query IP, first according to the qualifying conditions to obtain the specified number of lines, and then search the number of IP

cat xxx.log |grep "25/Nov/2019:15:[00-59]" |grep "GET /pixel.jpg?"|grep -E -o "([0-9] {1, 3} [\.]) {3} [0-9] {1, 3}"|wc -l 
1110

Feel these methods to check IP are bad, because the log file has been increasing, so the results are not the same, check the speed is relatively slow, may be the file is too large, in this record, there is always useful time.

