Use the uniq command to find the difference set of union intersection

Original: Coding diary (wechat official ID: Codelogs), welcome to share, reprint please reserve the source.

uniq

Uniq is a very useful command on Linux. It can be used for de-weighting, literally. But uniq is used only if the files are sorted, so we often use uniq together with the sort command, as follows:

$ cat test.txt
c 
a 
a
b 

$ sort test.txt | uniq
a
b
c

# sort -u can also remove weight
$ sort -u test.txt
a
b
c
Copy the code

As you can see above, sort-u also does weightless work, making uniQ’s original functionality less useful. The wc command is word count, but the one we use most often is WC-L to get the number of lines. Like WC, uniQ’s derivative functions are much more useful than the original de-duplication function, as follows:

Packet counting uniq -c

$ sort test.txt | uniq -c
      2 a
      1 b
      1 c
Copy the code

The -c option is used to count the number of TCP connections by state.

$ netstat -nat|awk '/tcp/{print $NF}'|sort|uniq -c
      4 CLOSE_WAIT
      6 ESTABLISHED
      2 LAST_ACK
      2 LISTEN
Copy the code

O and set

$ cat test1.txt
c 
a 
b 

$ cat test2.txt
c
b
d

$ cat test1.txt test2.txt |sort |uniq 
a
b
c
d 
Copy the code

Obviously, after the two files are merged, and then deduplicated, it is the union.

masked

$ cat test1.txt test2.txt | sort | uniq -d
b
c
Copy the code

The -d option outputs only duplicate entries. If two files are merged, duplicate entries are duplicated.

O difference set

$ cat test1.txt test2.txt test2.txt| sort | uniq -u 
a
Copy the code

The -u option will print only unique entries. Obviously, if an entry in test2.txt exists in text1.txt, it will not be printed.

comm

Comm commands can be used to calculate union, intersection and difference sets more intuitively. Similarly, comm data must be sorted in advance as follows:

$ comm <(sort -u test1.txt) <(sort -u test2.txt)
a
                b
                c
        d
Copy the code

<() is the command replacement syntax under bash, which is similar to generating a temporary virtual file whose contents are the output of the command. As above, the first column is the difference set of test1.txt minus test2.txt, the second column is the difference set of test2.txt minus test1.txt, and the third column is the intersection. If you want to only intersect, use comm-1-2 or comm-12, where -1 -2 indicates that the first and second columns are not displayed, and -3 indicates that the third column is not displayed. What, you’re saying there’s no union? Look below, just delete the whitespace with TR!

$ comm <(sort -u test1.txt) <(sort -u test2.txt)|tr -d '\t'
a
b
c
d
Copy the code

Content of the past

[bug Mc-10879] – Timestamp in mysql Hex, Base64,urlencode,urlencode,urlencode,urlencode,urlencode,urlencode

Use the uniq command to find the difference set of union intersection

uniq

Packet counting uniq -c

O and set

masked

O difference set

comm

Content of the past

Related Posts

Fast cloud protogenics, best practices for migration from data center to cloud native

Introduction to RabbitMQ Installation and use

Clickhouse synchronizes data with Kafka