Rm command is too weak!

Hello, I’m Liang Xu.

Creating, deleting, and modifying files are common operations performed by users on Linux systems. As you all know, using the rm command on Linux to delete a single file can be done almost instantaneously. However, if the number of files is large, the deletion operation can take a long time to complete.

Have you ever wondered how long it would take to delete half a million little files?

My goal in writing this article is to find the fastest way to delete huge files in Linux. The RM command is simply too weak!

We’ll start with some simple methods for deleting files, and then compare how fast the different methods can accomplish the task. Let’s see what’s the fastest way to delete it.

1. Several methods of file deletion

The rm command is the most commonly used command to delete a file in Linux. This command is already familiar, so let’s briefly review some examples of the RM command.

$ rm -f testfile
Copy the code

The -f option in the preceding command indicates that files are forcibly deleted without confirmation.

$ rm -rf testdirectory
Copy the code

This command will delete the directory named testDirectory and all of its contents (using the -r option to recursively delete files).

To delete a directory, we have another command, rmdir, but it only deletes the directory if it is empty.

$ rmdir testdirectory
Copy the code

Now let’s look at some other different ways to delete files in Linux.

One of my favorite methods is to use the find command and then delete. The find command is a handy tool for searching for files based on their type, size, creation date, modification date, and many more different criteria.

Let’s look at an example of the find command using -exec to call the rm command.

$ find /test -type f -exec rm {} \;
Copy the code

The above command will delete all files in the /test directory. First the find command looks for all the files in the directory, then it executes the rm command for each search result.

Let’s look at some different methods you can use with the find command to delete files.

$ find /test -mtime +7 -exec rm {} \;
Copy the code

In the example above, the find command will search for all the files in the /test directory that were modified seven days ago and then delete each file.

$ find /test -size +7M -exec rm {} \;
Copy the code

In the example above, all files larger than 7M in the /test directory are searched and then deleted.

In all of the find command examples we have listed above, the rm command is invoked for each file found. For example, in the last find command above, if there were 50 files larger than 7M in the result, the rm command would be called 50 times to delete the files. Such an operation would take much longer.

A better alternative to calling the RM command in find with the -exec parameter is to use the -delete option. Such as:

$ find /test -size +7M -delete
Copy the code

The effect is the same as the previous command.

2. What is the fastest command to delete a large number of files?

Without further ado, let’s get straight to the test.

Start by creating half a million files with a simple bash for loop.

$ for i in $(seq 1 500000); do echo testing >> $i.txt; done
Copy the code

In the above command, 500,000 TXT files will be created in the current working directory with names ranging from 1.txt to 500,000.txt. Each file contains the text for testing, so the file size should be in the range of at least a few thousand bytes.

After creating half a million files, we’ll try several ways to delete them to see which is the fastest way to delete huge files.

Round 1: indicates the rm command

Let’s start with a simple rm command and use the time command for timing.

$time rm -f * -bash: /bin/rm: Argument list too long real 0m11.126s user 0m9.673s sys 0m1.278s $time rm -f * -bash: /bin/rm: Argument list too long real 0m11.126s user 0m9.673s sys 0m1.278sCopy the code

We can see that the Argument list is too long, which means that the command did not complete the deletion, because the number of files given to the rm command was too large to complete, so it simply stopped working.

Don’t pay attention to the time command because the rm command doesn’t finish its work. The time command shows how long you executed the command, not the result of the command.

Round 2: uses the find command with the -exec parameter

Now let’s use the find command with the -exec parameter we saw earlier.

$ time find ./ -type f -exec rm {} \;
real    14m51.735s
user    2m24.330s
sys     9m48.743s
Copy the code

From our output using the time command, it takes 14 minutes and 51 seconds to delete 500,000 files from a single directory. This is quite a long time, because for each file, a separate RM command is executed until all files are deleted.

Round 3: use the find command of the -delete parameter

Now let’s test the elapsed time by using the -delete option in the find command.

$time find./ -type f-delete real 5m11.937s user 0m1.259s sys 0m28.441sCopy the code

The deletion speed is greatly improved, only 5 minutes and 11 seconds! This is an amazing improvement in speed when you delete millions of files in Linux.

Round 4: Perl

Now let’s take a look at how deleting files in Perl works and how fast it is compared to the other methods we’ve seen before.

$time perl -e 'for(<*>){((stat)[9]<(unlink))}' real 1m0.488s user 0m7.023s sys 0m27.403sCopy the code

As you can see, Perl deleted half a million files in that directory in about a minute, which is pretty fast compared to the other find and rm commands we’ve seen before!

However, if you are interested in using more complex options with Perl, you need to have some knowledge of Perl regular expressions.

Round 5: rsync

A lesser-used and lesser-known method for deleting a large number of files in a folder is our well-known tool Rsync, which is basically used to transfer and synchronize files between two local and remote locations in Linux.

Now let’s see how to use rsync to delete all files in a folder. In fact, we can do this simply by synchronizing the empty directory with the target directory with a large number of files.

In our example, the /test directory (target directory) has 500,000 files, and we create an empty directory called blankTest (source directory). We will now use the -delete option in the rsync command, which will delete all files in the target directory that do not exist in the source directory.

$time rsync -a --delete blanktest/ test/ real 2m52.502s user 0m2.772s sys 0m32.649sCopy the code

As you can see, the deletion took just 2 minutes and 52 seconds.

So rsync is better than find if you want to clear a directory containing millions of files.

3. Summary

The following table summarizes the speed of deleting 500,000 files in Different ways in Linux for your reference.

The command	Spend time
The rm command	Unable to delete large numbers of files
Use the find command with the -exec parameter	14 minutes and 51 seconds
Use the find command with the -delete parameter	5 minutes and 11 seconds
Perl	1 minute
The rsync command	2 minutes and 52 seconds

Finally, recently, many friends asked me for Linux learning roadmap, so I stayed up for a month in my spare time according to my own experience, and sorted out an e-book. Whether you are interviewing or self-improvement, I believe will help you!

Free to everyone, just ask everyone to point to me!

Ebook | Linux development learning roadmap

Also hope to have a small partner can join me, do this e-book more perfect!

Have a harvest? Hope the old iron people come to a triple whammy, give more people to see this article

Recommended reading:

Dry goods | programmers advanced architect necessary resources free of charge
Book | programmers must read classic books (hd PDF version)

Welcome to pay attention to my blog: good Xu Linux tutorial network, full of dry goods!

1. Several methods of file deletion

2. What is the fastest command to delete a large number of files?

Round 1: indicates the rm command

Round 2: uses the find command with the -exec parameter

Round 3: use the find command of the -delete parameter

Round 4: Perl

Round 5: rsync

3. Summary

Have a harvest? Hope the old iron people come to a triple whammy, give more people to see this article

Related Posts

Design reusable message push module

Python crawlers slow? Let’s take a look at concurrent programming

Simple Queue System RQ- In-depth Understanding (4)