Recently found some fun tools, can’t wait to share with you.
How do you check Linux logs? For example, I usually use classic system commands such as tail, head, cat, sed, more and less, or three-way data filtering tools such as AWK, which are highly efficient when combined with each other. However, there is one thing that makes me headache in the process of using, that is, there are too many command parameter rules, and it hurts to remember people’s head.
Is there a general way to check logs, such as using SQL queries, which are expressions programmers are familiar with?
Today share the tool Q, to achieve the way to write SQL to query, statistics of text content, let’s see what magic this goods in the end.
Q is a command line tool that allows you to execute SQL statements directly on any file or query result, such as the result set of ps -ef query process commands.
The idea is that text is a database table, well, of course, I understand this sentence, hahaha
It treats ordinary files or result sets as database tables, supports almost all SQL structures such as WHERE, GROUP BY, JOINS, and so on, supports automatic column name and column type detection, and supports cross-file join queries, which are described later, and supports multiple encodings.
Installation is relatively simple, in Linux CentOS environment, as long as the following three steps to fix, Windows environment is only need to install an EXE can be used.
Wget https://github.com/harelba/q/releases/download/1.7.1/q-text-as-data-1.7.1-1.noarch.rpm # download version sudo RPM - the ivh Q-text-as-data-1.7.1-1.noarch. RPM # install q --version #Copy the code
“Official document: Harelba.github. IO /q
Syntax Q Supports all SQLiteSQL syntax, standard command line format q + parameter command + “SQL” q < command >”
”
SELECT * FROM myfile.log (“SELECT * FROM myfile.log”) q “SELECT * FROM myfile.log”
There is no problem to use Q without additional parameters, but using parameters will make the display result more beautiful, so here is a brief understanding, its parameters are divided into two kinds.
Input Command: Operations are performed on the file or result set to be queried. For example, the -h command indicates that the entered data contains the title line. q -H “SELECT * FROM myfile.log”
In this case, column names are automatically detected and can be used in query statements. If this option is not provided, the column is automatically named cX, starting with C1 and so on.
Q "select c1, c2 from..." Output command: used to query the output result set, such as -o, to display the column names. [root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "select count(UID) from - where UID='root'" 104 [root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "select count(UID) from - where UID='root'" count(UID) 104Copy the code
There are many parameters are not listed, interested students on the official website to see, next we will focus on how to use SQL to deal with various query log scenarios.
Play more thieves below let’s look at a few query log often scenarios, how to write this SQL.
1. Keyword query
Keyword retrieval is probably the most frequently used operation in daily development, but I personally don’t think Q has an advantage because it has to specify a column when querying.
[root@iZ2zebfzaequ90bdlz820sZ software]# q "select * from douyin.log where c9 like '% '" 2021-06-11 14:46:49.323 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : INFO 22790 -- [NIO-88888-exec-5] c.x.douyin.controller.ParserController : INFO 22790 -- [NIO-88888-exec-2] C.X.D ouyin. Controller. ParserController: to parse the URL: url=https%3A%2F%2Fv.douyin.com % 2 fe9pqjbr % 2 f 2021-06-11 2Copy the code
Using the grep command is full-text retrieval.
[root @ iZ2zebfzaequ90bdlz820sZ software] # cat douyin. Log | grep 'to parse the URL 14:46:49 2021-06-11. 22790-323 the INFO [nio-8888-exec-2] c.x.douyin.controller.ParserController : INFO 22790 -- [NIO-88888-exec-5] C.X.D ouyin. Controller. ParserController: to parse the URL: url=https%3A%2F%2Fv.douyin.com % 2 fe9pdhgp % 2 fCopy the code
2, fuzzy query
Like fuzzy search. If there is a name in the text column, search by column name; if there is no name in the text column, search by column number C1, C2, cN.
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log abc 2 3 4 5 23 24 25 [root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log where abc like '%2%'" Warning: column count is one - did you provide the correct delimiter? 2 23 and 24, 25Copy the code
3. Intersection union
The UNION and UNION ALL operators are supported for the intersection or UNION of multiple files.
Test. log and test1.log files are created as follows.
q -H -t "select * from test.log union select * from test1.log" [root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log abc 2 3 4 5 [root@iZ2zebfzaequ90bdlz820sZ software]# cat test1.log abc 3 4 5 6 [root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log union select * from test1.log" Warning: column count is one - did you provide the correct delimiter? Warning: column count is one - did you provide the correct delimiter? 2, 3, 4, 5, 6Copy the code
4, content to heavy
For example, count the total number of uuid fields in a./clicks. CSV file.
q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
Copy the code
5. Automatic detection of column types
Note: Q understands whether each column is a number or a string, and determines whether to filter based on a real value comparison or a string comparison, using the -t command.
Q-h-t "SELECT request_id,score FROM./clicks WHERE score > 0.7 ORDER BY score DESC LIMIT 1"Copy the code
6. Field operations
Read the system command query result and calculate the total value of each user and group in/TMP. You can perform operations on fields.
Sudo find/TMP - ls | q "SELECT c5, c6, sum (c7) / 1024.0/1024 AS total FROM - GROUP BY c5, c6 ORDER BY total desc" [root @ iZ2zebfzaequ90bdlz820sZ software] # sudo find/TMP - ls | q "SELECT c5, c6, sum (c7) / 1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc" WWW WWW 8.86311340332 root root 0.207922935486 mysql mysql 4.76837158203e-06Copy the code
7. Data statistics
Count the top 3 user ids with the maximum number of processes in the system and sort them in descending order. This needs to be used together with system commands. Query all processes first and then use SQL filtering.
ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
root 104
www 16
rabbitmq 4
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
UID cnt
root 110
www 16
rabbitmq 4
Copy the code
We see the difference between the add and no -o command in whether to display the query result title.
8. Check documents
Typically, our log files are divided into many fixed-size sub-files on a daily basis. Without a unified log collection server, it is like looking for a needle in a haystack to find a keyword without an error time interval.
It would be much easier if you could merge all the contents of the files and then query them later. Q supports joint query of files like database tables.
Q -h "select * from douyin. Log a join douyin-2021-06-18.0. Log b on (a.c2=b.c3) where b.c1='root'"Copy the code
Q: Isn’t it nice to write so much code in AWK? The purpose of this tool is not to replace existing tools, but to provide a more convenient way to log.
I also have in the use of AWK is really very powerful, but there is a learning cost involved here, a variety of commands, matching rules to play or to work hard. For novice programmers with a little database experience, write SQL problems are not big, the use of Q will be much easier.