Abstract: In this article, you learned the most common flags used with the tar command, how to create and extract a tar archive, and how to create and extract a Gzip-compressed tar archive.

This document is shared with Tiamo_T from “Tar in Linux: Compress and extract files” by Huawei cloud community.

How does the Linux tar command work?

The tar command is used to create.tar,.tar.gz,.tgz, or tar.bz2 files, which are usually called “tarball”. The.tar.gz and.tgz extensions are used to identify files generated using gzip compression to reduce file size. The file with the.tar.bz2 extension is compressed using bzip2.

Linux distributions provide tar binaries that support gzip compression without the help of external commands. As we’ll see in this article, this may not apply to other types of compression.

Let’s start with three examples of tar to familiarize ourselves with the most common flags.

Create an archive containing two files

Here is a basic example of tar, in which case we do not use compression:

tar -cf archive.tar testfile1 testfile2
Copy the code

This command creates an archive file named archive.tar that contains two files: testfile1 and testfile2.

Here are the meanings of the two symbols:

-c (same as -create) : Creates a new archive

-f: This allows you to specify an archive file (in this case called archive.tar)

The file command confirms that archive.tar is an archive:

[myuser@localhost]$ file archive.tar 
archive.tar: POSIX tar archive (GNU)
Copy the code

Another useful flag is the **-v flag, which provides detailed output of the files processed during the execution of the tar** command on Linux.

If we also pass the -v flag when creating the archive, let’s see how the output changes:

[myuser@localhost]$ tar -cfv archive.tar testfile1 testfile2
tar: archive.tar: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Copy the code

Strangely, for some reason we got an error…

This is because tar creates an archive with a name based on what comes after the -f flag, which in this case is v.

The result is an archive named V, which you can see from the ls output below:

[myuser@localhost]$ ls -al total 20 drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 09:42 . drwxrwxrwt. 6 root root 4096 Jul 17 09:38.. -rw-rw-r--. 1 myuser mygroup 0 Jul 17 09:38 testfile1 -rw-rw-r--. 1 myuser mygroup 0 Jul 17 09:38 testfile2 -rw-rw-r--. 1 myuser mygroup 10240 Jul 17 09:42 v [myuser@localhost]$ file v v: POSIX tar archive (GNU)Copy the code

The “no such file or directory” directory is due to tar’s attempt to create an archive called V containing three files: archive.tar, testFile1, and testfile2.

But archive.tar does not exist, so an error occurs.

This shows how important the order of the TAR flags is.

Let’s swap the -f and -v flags in tar and try again:

[myuser@localhost]$ tar -cvf archive.tar testfile1 testfile2
testfile1
testfile2
Copy the code

All went well this time, and the detail flag shows the names of the two files added to the archive we are creating.

Make sense?

List all the files in the tar archive in detail

To list all the files in the tar archive without extracting their contents, we’ll introduce a fourth flag:

-t: lists the file contents

We can now put three flags together: -t, -v, and **-f** to see the files in the archive we created earlier:

[myuser@localhost]$ tar -tvf archive.tar 
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile2
Copy the code

Should I use Dash with Tar?

I have noticed that in some cases the dash precedes the logo, but this is not always the case.

So let’s see if it makes a difference through dashes.

First, let’s try running the same command without using the dash before the flag:

[myuser@localhost]$ tar tvf archive.tar -rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1 -rw-rw-r-- myuser/mygroup  0 2020-07-17 09:38 testfile2Copy the code

The output is the same, which means no dashes are needed.

Just to give you an idea, you can run tar as follows and get the same output:

tar -t -v -f archive.tar 
tar -tvf archive.tar
tar -tvf archive.tar
tar --list --verbose --file archive.tar
Copy the code

The last command uses the long option style as a flag provided to Linux commands.

You can see how much easier it is to use the shorter version of the flag.

Extract all files from the archive

Let’s introduce an additional flag that allows extraction of the contents of the tar archive. This is the **-x** flag.

To extract the contents of the file we created earlier, we can use the following command:

tar -xvf archive.tar
(the two lines below are the output of the command in the shell)
testfile1
testfile2
ls -al
total 20
drwxrwxr-x 2 myuser mygroup    59 Feb 10 21:21 .
drwxr-xr-x 3 myuser mygroup    55 Feb 10 21:21 ..
-rw-rw-r-- 1 myuser mygroup 10240 Feb 10 21:17 archive.tar
-rw-rw-r-- 1 myuser mygroup    54 Feb 10 21:17 testfile1
-rw-rw-r-- 1 myuser mygroup    78 Feb 10 21:17 testfile2 
Copy the code

As you can see, we use the **-x flag to extract the contents of the archive, the -v flag for detailed extraction, and the -f** flag to reference the archive file specified after the flag (archive.tar).

Note: As mentioned earlier, we enter the dash character only once before all flags. We can specify dashes before each flag, and the output will be the same.

tar -x -v -f archive.tar
Copy the code

There is also a way to extract individual files from the archive.

In this case, considering we only have two files in our archive, it doesn’t make much difference. However, if you have an archive of thousands of files and you only need one of them, it can make a huge difference.

This is common if you have a backup script to create an archive of log files for the past 30 days, and you only want to see the contents of log files for a specific date.

To extract testFile1 only from archive.tar, you can use the following general syntax:

tar -xvf {archive_file} {path_to_file_to_extract}
Copy the code

In our specific case:

tar -xvf archive.tar testfile1
Copy the code

Let’s see what happens if I create a tar archive with two directories:

[myuser@localhost]$ ls -ltr
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir1
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir2

[myuser@localhost]$ tar -cvf archive.tar dir*
dir1/
dir1/testfile1
dir2/
dir2/testfile2
Copy the code

Note: Notice that I use the wildcard * to include in the archive any files or directories whose names begin with “dir”.

If I only wanted to extract testFile1 the command would be:

tar -xvf archive.tar dir1/testfile1
Copy the code

Unzipping preserves the original directory structure, so I’ll get testFile1 in dir1:

[myuser@localhost]$ ls -al dir1/
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:36 .
drwxrwxr-x. 3 myuser mygroup 4096 Jul 17 10:36 ..
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:34 testfile1
Copy the code

Is everything clear?

Reduce the size of the tar file

Gzip and Bzip2 compression can be used to reduce the size of tar archives.

Other tar flags that enable compression are:

  • -z for Gzip compression: the long flag is ** -gzip **

  • -j for Bzip2 compression: the long flag is ** — Bzip2 **

To create a Gzippedtar archive named archive.tar.gz using the verbose output, we will use the following command (and one of the most commonly used commands when creating a tar archive) :

tar -czvf archive.tar.gz testfile1 testfile2
Copy the code

And extract its contents, we will use:

tar -xzvf archive.tar.gz
Copy the code

We could also use the.tgz extension instead of.tar.gz, and the result would be the same.

Now, let’s create an archive that uses bzip2 compression:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
/bin/sh: bzip2: command not found
tar: Child returned status 127
tar: Error is not recoverable: exiting now
Copy the code

The error “bzip2: Command not found” indicates that the tar command is trying to compress using the bzip2 command, but cannot be found on our Linux system.

The solution is to install Bzip2. The process depends on the Linux distribution you are using, which in my case is CentOS that uses YUM as the package manager.

Let’s install bzip2 using the following yum command:

yum install bzip2
Copy the code

I can confirm the existence of the bzip2 binary using the which command:

[myuser@localhost]$ which bzip2
/usr/bin/bzip2
Copy the code

Now, if I run tar again using bzip2 compression:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
[myuser@localhost]$ ls -al
total 16
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:45 .
drwxrwxrwt. 6 root     root     4096 Jul 17 10:53 ..
-rw-rw-r--. 1 myuser mygroup  136 Jul 17 10:54 archive.tar.bz2
-rw-rw-r--. 1 myuser mygroup  128 Jul 17 10:45 archive.tar.gz
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile1
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile2
Copy the code

Everything is fine!

Also, in case I’m curious, I’d like to see the difference between the two archives (.tar.gz and.tar.bz2) according to the Linux file command:

[myuser@localhost]$ file archive.tar.gz 
archive.tar.gz: gzip compressed data, last modified: Fri Jul 17 10:45:04 2020, from Unix, original size 10240
[myuser@localhost]$ file archive.tar.bz2 
archive.tar.bz2: bzip2 compressed data, block size = 900k
Copy the code

As you can see, Linux can distinguish between files generated using two different compression algorithms.

conclusion

In this article, you learned about the most common flags used with the tar command, how to create and extract a tar archive, and how to create and extract a Gzip-compressed tar archive.

Let’s review all the signs again:

  • -c: Creates a new archive

  • -f: allows you to specify the file name of the archive

  • -t: lists the file contents

  • -v: lists processed files in detail

  • -x: extracts files from the archive

  • -z: uses gzip compression

  • -j: Uses bzip2 compression

Click to follow, the first time to learn about Huawei cloud fresh technology ~