Working with- files-in-Python

Python has several built-in modules and methods for handling files. These methods are split into modules such as OS, os.path, shutil and pathlib. This article will list the most common operations and methods on files in Python.

In this article you will learn how to:

Get file attributes
Create a directory
File name pattern matching
Traverse the directory tree
Create temporary files and directories
Delete files and directories
Copy, move, and rename files and directories
Create and unzip ZIP and TAR files
usefileinputThe module opens multiple files

Read and write file data in Python

Using Python to read and write files is straightforward. To do this, you must first open the file in the appropriate mode. Here is an example of how to open a text file and read its contents.

with open('data.txt'.'r') as f:
    data = f.read()
    print('context: {}'.format(data))
Copy the code

Open () takes a file name and a mode as its arguments, and r means to open the file in read-only mode. If you want to write data to a file, use w as the argument.

with open('data.txt'.'w') as f:
    data = 'some data to be written to the file'
    f.write(data)
Copy the code

In the above example, open() opens a file for reading or writing and returns a file handle (f in this case) that provides methods that can be used to read or write file data. Read Working With File I/O in Python for more information on how to read and write files.

Get directory list

Suppose your current working directory has a subdirectory called my_directory that contains the following contents:

. ├ ─ ─ file1. Py ├ ─ ─ file2. CSV ├ ─ ─ file3. TXT ├ ─ ─ sub_dir │ ├ ─ ─ bar. Py │ └ ─ ─ foo py ├ ─ ─ sub_dir_b │ └ ─ ─ file4. TXT └ ─ ─ ├─ config.py ├─ file5.txtCopy the code

Python’s built-in OS module has many useful methods for listing directory contents and filtering results. To get a list of all files and folders for a particular directory in a file system, you can use os.listdir() in legacy Python or os.scandir() in Python 3.x. If you also want to get file and directory properties (such as file size and modification date), os.scandir() is the preferred method.

Get the directory list using the legacy version of Python

import os
entries = os.listdir('my_directory')
Copy the code

Os.listdir () returns a Python list containing the names of files and subdirectories of the directory to which the path argument points.

['file1.py'.'file2.csv'.'file3.txt'.'sub_dir'.'sub_dir_b'.'sub_dir_c']
Copy the code

The directory list now looks hard to read, and circular printing of the results of the call to os.listdir() helps.

for entry in entries:
    print(entry)

"""
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
"""

Copy the code

Get the directory list using a modern version of Python

In modern Python versions, os.scandir() and pathlib.path can be used instead of os.listdir().

Os.scandir () is referenced in Python 3.5, documented as PEP 471.

Os.scandir () returns an iterator instead of a list.

import os
entries = os.scandir('my_directory')
print(entries)
# <posix.ScandirIterator at 0x105b4d4b0>
Copy the code

ScandirIterator points to all entries in the current directory. You can iterate over the contents of the iterator and print the file name.

import os
with os.scandir('my_directory') as entries:
    for entry in entries:
        print(entry.name)
Copy the code

Os.scandir () is used here with the with statement because it supports the context management protocol. Use the context manager to close the iterator and automatically release the acquired resources when the iterator is exhausted. The result of printing the filename in my_directory is the same as that seen in the os.listdir() example:

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy the code

Another way to get a directory list is to use the pathlib module:

from pathlib import Path

entries = Path('my_directory')
for entry in entries.iterdir():
    print(entry.name)
Copy the code

Pathlib.path () returns either PosixPath or WindowsPath objects, depending on the operating system.

The pathlib.path () object has a.iterdir() method that creates an iterator containing all the files and directories in that directory. Each entry generated by.iterdir() contains information about a file or directory, such as its name and file attributes. Pathlib was first introduced in Python3.4 and is a nice enhancement to Python that provides an object-oriented interface to the file system.

In the example above, you call pathlib.path () and pass in a Path argument. Then call.iterdir() to get a list of all the files and directories under my_directory.

Pathlib provides a set of classes that provide most common operations on paths in a simple and object-oriented manner. Using Pathlib is more efficient than using OS functions. Another advantage of using Pathlib compared to OS is the reduction in the number of packages or modules imported to operate the file system path. For more information, read Python 3’s Pathlib Module: Taming the File System.

Running the code above yields the following result:

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy the code

Using pathlib.path () or os.scandir() instead of os.listdir() is the preferred way to get directory lists, especially if you need to get information about file types and file properties. Pathlib.path () provides most of the functionality for handling files and paths in OS and Shutil, and its approach is more efficient than these modules. We will discuss how to get file attributes quickly.

function	describe
os.listdir()	Returns all files and folders in the directory as a list
os.scandir()	Returns an iterator containing all objects in a directory that contains file property information
pathlib.Path().iterdir()	Returns an iterator containing all objects in a directory that contains file property information

These functions return a list of all the contents of a directory, including subdirectories. This may not always be what you want, and the next section shows you how to filter the results from the directory list.

Lists all files in the directory

This section shows you how to print out the names of files in a directory using os.listdir(), os.scandir(), and pathlib.path (). To filter directories and list only files that have a list of directories generated by os.listdir(), use os.path:

import os

basepath = 'my_directory'
for entry in os.listdir(basepath):
    # check whether the path is a file type using os.path.isfile
    if os.path.isfile(os.path.join(base_path, entry)):
        print(entry)
Copy the code

Os.listdir () is called here to return a list of everything in the specified path, and then os.path.isfile() is used to filter the list to show only file types instead of directory types. The code execution result is as follows:

file1.py
file2.csv
file3.txt
Copy the code

An easier way to list all the files in a directory is to use os.scandir() or pathlib.path () :

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)
Copy the code

Using os.scandir() is much clearer and easier to understand than using os.listdir(). Entry.isfile () is called for each entry of ScandirIterator, which returns True to indicate that the entry is a file. The output of the above code is as follows:

file1.py
file3.txt
file2.csv
Copy the code

Next, show how to use pathlib.path () to list files in a directory:

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    if entry.is_file():
        print(entry.name)
Copy the code

.is_file() is called on each item produced by.iterdir(). The output is the same as above:

file1.py
file3.txt
file2.csv
Copy the code

The code above can be more concise if you combine the for loop and if statements into a single generator expression. Dan Bader’s article on generator expressions is recommended.

The updated version is as follows:

from pathlib import Path

basepath = Path('my_directory')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
    print(item.name)
Copy the code

The result of the above code execution is the same as before. This section shows that using os.scandir() and pathlib.path () to filter files or directories is more intuitive and the code looks cleaner than using os.listdir() and os.path.

Listing subdirectories

If you want to list subdirectories instead of files, use the following method. Now show how to use os.listdir() and os.path() :

import os

basepath = 'my_directory'
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)
Copy the code

When you call os.path,join() multiple times, manipulating the file system this way becomes cumbersome. Running this code on my computer produces the following output:

sub_dir
sub_dir_b
sub_dir_c
Copy the code

Here’s how to use os.scandir() :

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)
Copy the code

As in the example in the list of files,.is_dir() is called here on each item returned by os.scandir(). If this is a directory, is_dir() returns True and prints out the name of the directory. The output is the same as above:

sub_dir_c
sub_dir_b
sub_dir
Copy the code

Here’s how to use pathlib.path () :

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    if entry.is_dir():
        print(entry.name)
Copy the code

Call is_dir() on each item returned by the.iterdir() iterator to check whether it is a file or directory. If the item is a directory, its name is printed and the output is the same as in the previous example:

sub_dir_c
sub_dir_b
sub_dir
Copy the code

Get file attributes

Python makes it easy to get file properties such as file size and modification time. This can be obtained by using os.stat(), os.scandir(), or pathlib.path.

Os.scandir () and pathlib.path () directly get a list of directories containing file attributes. This may be more efficient than listing the files using os.listdir() and then getting the file attribute information for each file.

The following example shows how to get the last modification time of a file in my_directory. Output as a timestamp:

import os

with os.scandir('my_directory') as entries:
    for entry in entries:
        info = entry.stat()
        print(info.st_mtime)
        
"" 1548163662.3952665 1548163689.1982062 1548163697.9175904 1548163721.1841028 1548163740.765162 1548163769.4702623 ""
Copy the code

Os.scandir () returns a ScandirIterator object. Each item in a ScandirIterator has a.stat() method to get information about the file or directory it points to. .stat() provides information such as file size and last modification time. In the example above, the code prints the ST_time property, which is the last time the contents of the file were modified.

The pathlib module has methods to get file information with the same result:

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    info = entry.stat()
    print(info.st_mtime)

"" 1548163662.3952665 1548163689.1982062 1548163697.9175904 1548163721.1841028 1548163740.765162 1548163769.4702623 ""
Copy the code

In the above example, the iterators returned by the.iterdir() loop and the file attributes are obtained by calling.stat() on each of them. The st_mtime attribute is a floating-point value that represents a timestamp. To make the value returned by st_time easier to read, you can write a helper function to convert it to a datetime object:

import datetime from pathlib import Path def timestamp2datetime(timestamp, convert_to_local=True, utc=8, Is_remove_ms =True) """ convert UNIX timestamp to datetime object :param timestamp: timestamp: param convert_to_local: whether to convert to local time :param utc: Time zone information: UTC +8 :param is_remove_ms: whether to remove ms: return: datetime object """ if is_remove_ms: timestamp = int(timestamp) dt = datetime.datetime.utcfromtimestamp(timestamp) if convert_to_local: dt = dt + datetime.timedelta(hours=utc) return dt def convert_date(timestamp, format='%Y-%m-%d %H:%M:%S'): dt = timestamp2datetime(timestamp) return dt.strftime(format) basepath = Path('my_directory') for entry in basepath.iterdir(): If entry.is_file() info = entry.stat() print('{} last time changed to {}'. Format (entry.name, timestamp2dateTime (info.st_mtime)))Copy the code

Get the list of files in my_directory and their attributes, then call convert_date() to convert the file and finally make the modification time display in a human-readable way. Convert_date () converts the datetime type to a string using.strftime().

Output from the above code:

TXT last modified 2019-01-24 09:04:39 file2. CSV last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39 file3. TXT last modified 2019-01-24 09:04:39 file1. CSV last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39Copy the code

The syntax for converting dates and times to strings can be confusing. For more information, please consult the relevant official documentation. Another way is to read strftime.org.

Create a directory

Sooner or later, your program will need to create directories in which to store data. OS and pathlib contain functions to create directories. We will consider the following methods:

methods	describe
os.mkdir()	Create a single subdirectory
os.makedirs()	Create multiple directories, including intermediate directories
Pathlib.Path.mkdir()	Create a single or multiple directories

Creating a single directory

To create a single directory, pass the directory path as an argument to os.mkdir() :

import os

os.mkdir('example_directory')
Copy the code

If the directory already exists, os.mkdir() raises an FileExistsError exception. Alternatively, you can use pathlib to create directories:

from pathlib import Path

p = Path('example_directory')
p.mkdir()
Copy the code

If the path already exists, mkdir() raises FileExistsError:

FileExistsError: [Errno 17] File exists: 'example_directory'
Copy the code

To avoid error throws like this, catch errors when they occur and let your users know:

from pathlib import Path

p = Path('example_directory')
try:
    p.mkdir()
except FileExistsError as e:
    print(e)
Copy the code

Alternatively, you can ignore FileExistsError exceptions by passing exist_OK =True to.mkdir() :

from pathlib import Path

p = Path('example_directory')
p.mkdir(exist_ok=True)
Copy the code

If the directory already exists, no error is raised.

Create multiple directories

Os.makedirs () is similar to os.mkdir(). The difference between the two is that os.makedirs() can not only create individual directories, but also recursively create directory trees. In other words, it can create any intermediate folder necessary to ensure that the full path exists.

Os.makedirs () is similar to running mkdir -p in bash. For example, to create a set of directories like 2018/10/05, you could do something like this:

import os

os.makedirs('2018/10/05', mode=0o770)
Copy the code

The code above creates the directory structure for 2018/10/05 and provides read, write, and execute permissions to the owner and group users. The default mode is 0O777, which adds the permissions of other user groups. Refer to the documentation for more details on file permissions and how schemas are applied.

Run the tree command to confirm the permissions of our application:

$ tree -p -i .
.
[drwxrwx---]  2018
[drwxrwx---]  10
[drwxrwx---]  05
Copy the code

The code above prints out the directory tree for the current directory. Tree is usually used to list the contents of a directory in a tree structure. Passing the -p and -i arguments will print the directory name and its file permissions in a vertical list. -p is used to output file permissions, and -i is used to cause the tree command to produce a vertical list without indentation.

As you can see, all directories have 770 permissions. Another way to create multiple directories is to use.mkdir() of pathlib.path:

from pathlib import Path

p = Path('2018/10/05')
p.mkdir(parents=True, exist_ok=True)
Copy the code

Make path.mkdir () create the 05 directory and any parent directories whose paths are valid by passing the parents=True keyword argument to it.

By default, os.makedirs() and pathlib.path.mkdir () raise OSError if the target directory exists. This behavior can be overridden (starting with Python3.2) by passing exist_OK =True as the keyword argument each time a function is called.

Running the above code results in something like the following:

├ ─ 08.08.02Copy the code

I prefer to use pathlib when creating directories, because I can use the same function method to create one or more directories.

File name pattern matching

After using one of the above methods to get a list of files in a directory, you might want to search for files that match a particular pattern.

Here are some methods and functions you can use:

endswith() 和 startswith()String method
fnmatch.fnmatch()
glob.glob()
pathlib.Path.glob()

These methods and functions are discussed below. The examples in this section will be executed under a directory named some_directory that has the following structure:

. ├ ─ ─ admin. Py ├ ─ ─ data_01_backup. TXT ├ ─ ─ data_01. TXT ├ ─ ─ data_02_backup. TXT ├ ─ ─ data_02. TXT ├ ─ ─ data_03_backup. TXT ├ ─ ─ Data_03. TXT ├ ─ ─ sub_dir │ ├ ─ ─ file1. Py │ └ ─ ─ file2. Py └ ─ ─ tests. PyCopy the code

If you are using Bash shell, you can create the above directory structure with the following command:

mkdir some_directory cd some_directory mkdir sub_dir touch sub_dir/file1.py sub_dir/file2.py touch data_{01.. 03}.txt data_{01.. 03}_backup.txt admin.py tests.pyCopy the code

This will create the some_directory and enter it, followed by creating sub_dir. The next line creates file1.py and file2.py in sub_dir, and the last line uses the extension to create all other files. To learn more about shell extensions, read here.

Using string methods

Python has several built-in methods for modifying and manipulating strings. Two of these methods.startswith() and.endswith() are very useful when matching file names. To do this, first get a list of directories and then iterate over it.

import os

for f_name in os.listdir('some_directory') :if f_name.endswith('.txt'):
        print(f_name)
Copy the code

The code above finds all the files in some_directory, traverses and prints all the file names with the.endswith() extension. The running code on my computer output is as follows:

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy the code

use`fnmatch`Simple file name pattern matching

The ability of string methods to match is limited. Fnmatch has more advanced functions and methods for pattern matching. We’ll consider using fnmate.fnmatch (), which is a support for using * and? And so on. For example, using fnmatch to find all.txt files in a directory, you can do this:

import os
import fnmatch

for f_name in os.listdir('some_directory') :if fnmatch.fnmatch(f_name, '*.txt'):
        print(f_name)
Copy the code

Iterate over the list of files in some_directory and use.fnmatch() to perform a wildcard search for files with the.txt extension.

More advanced pattern matching

Suppose you want to find a.txt file that matches a particular drop. For example, you might point to a.txt file that contains single-pass data, a set of numbers between underscores, and a file name that contains the word backup. Similar to data_01_backup, data_02_backup, or data_03_backup.

You can use fnmate.fnmatch () like this:

import os
import fnmatch

for f_name in os.listdir('some_directory') :if fnmatch.fnmatch(f_name, 'data_*_backup.txt'):
        print(f_name)
Copy the code

Here, just print out the name of the file that matches the data_*_backup.txt schema. The * in the pattern will match any character, so running this code looks for all text files whose names start with data and start with backup.txt, as shown in the following output:

data_01_backup.txt
data_02_backup.txt
data_03_backup.txt
Copy the code

use`glob`Perform file name pattern matching

Another useful pattern matching module is the Glob.

.glob() is left and right in the glob module just like fnmate.fnmatch (), but unlike fnMach.fnmatch (), it will start with. Files at the beginning are considered special files.

Do UNIX and related systems use wildcard images in file lists? And * indicate full match.

For example, in a UNIX shell use mv *.py python_files to move all files with the.py extension from the current directory to python_files. This * is a wildcard representing any number of characters, and *. Py is a full pattern. This shell function is not available in Windows. But the Glob module adds this functionality to Python, making it available to Windows programs.

Here is a query for all Python code files in the current directory using the glob module:

import glob

print(glob.glob('*.py'))
Copy the code

Glob.glob (‘*.py’) searches the current directory for files with the.py extension and returns them as a list. Glob also supports shell-style wildcards for matching:

import glob

for name in glob.glob('*[0-9]*.txt'):
    print(name)
Copy the code

This will find all text files (.txt) with numbers in their filenames:

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy the code

Glob also makes it easy to recursively search files in subdirectories:

import glob

for name in glob.iglob('**/*.py', recursive=True):
    print(name)
Copy the code

This example uses glob.iglob() to search for all.py files in the current directory and subdirectories. Passing recursive=True as.iglob() causes it to search.py files in the current directory and subdirectories. Glob.glob () differs from glob.iglob() in that iglob() returns an iterator instead of a list.

Running the above code results in the following:

admin.py
tests.py
sub_dir/file1.py
sub_dir/file2.py
Copy the code

Pathlib also contains similar methods to flexibly retrieve file lists. The following example shows that you can use.path.glob () to list files of file types starting with the letter p.

from pathlib import Path

p = Path('. ')

for name in p.glob('*.p*'):
    print(name)
Copy the code

Calling p.glob(‘*.p*’) returns a generator object pointing to all files in the current directory whose extensions begin with the letter P.

Path.glob() is similar to os.glob(), discussed above. As you can see, Pathlib blends many of the best features of the OS, os.path, and Glob modules into one module, which makes it easy to use.

To recap, here’s the list of features we introduced in this section:

function	describe
startswith()	Tests whether a string starts with a particular pattern, returning True or False
endswith()	Tests whether a string ends in a particular pattern, returning True or False
fnmatch.fnmatch(filename, pattern)	Tests whether the file name matches this pattern, returning True or False
glob.glob()	Returns a list of filenames that match the pattern
pathlib.Path.glob()	Returns a generator object that matches the pattern

Traverse directories and process files

A common programming task is to traverse a directory tree and process the files in the directory tree. Let’s explore how to do this using the built-in Python function os.walk(). Os.walk () is used to generate file names in a directory tree by traversing the tree from top to bottom or from bottom to top. For the purposes of this section, we want to manipulate the following directory tree:

├ ─ ─ folder_1 │ ├ ─ ─ file1. Py │ ├ ─ ─ file2. Py │ └ ─ ─ file3. Py ├ ─ ─ folder_2 │ ├ ─ ─ file4. Py │ ├ ─ ─ file5. Py │ └ ─ ─ file6 does. Py ├ ─ ─ Test1. TXT └ ─ ─ test2. TXTCopy the code

Here is an example of using os.walk() to list all the files and directories in a directory tree.

Os.walk () defaults to traversing a directory from top to bottom:

import os
for dirpath, dirname, files in os.walk('. '):
   print(f'Found directory: {dirpath}')
   for file_name in files:
       print(file_name)
Copy the code

Os.walk () returns three values in each loop:

The name of the current folder
Current folder List of subfolders
A list of files in the current folder

In each iteration, it prints out the names of the subdirectories and files it finds:

Found directory: .
test1.txt
test2.txt
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Copy the code

To traverse the directory tree bottom-up, pass the topdown=False keyword argument to os.walk() :

for dirpath, dirnames, files in os.walk('. ', topdown=False):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)
Copy the code

Passing topDown =False causes os.walk() to first print out the files it finds in subdirectories:

Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Found directory: .
test1.txt
test2.txt
Copy the code

As you can see, the program lists the contents of subdirectories before listing the contents of the root directory. This is useful in cases where you want to recursively delete files and directories. You will learn how to do this in the following sections. By default, OS.walk does not access directories created over a soft connection. You can override the default behavior by using the followLinks = True argument.

Create temporary files and directories

Python provides the tempFile module for easy creation of temporary files and directories.

Tempfiles can be opened while your program is running and temporary data can be stored in a file or directory. Tempfile will delete these temporary files after your program stops running.

Now, let’s see how to create a temporary file:

from tempfile import  TemporaryFile

Create a temporary file and write some data to it
fp = TemporaryFile('w+t')
fp.write('Hello World! ')
Go back to the beginning and read the data from the file
fp.seek(0)
data = fp.read()
print(data)
# Close the file and it will be deleted
fp.close()
Copy the code

The first step is to import TemporaryFile from the TempFile module. Next, create an object-like file using the TemporaryFile() method and pass in a schema that you want to open the file. This creates and opens a file that can be used as a temporary storage area.

In the example above, the mode is W + T, which causes tempFile to create temporary text files in write mode. There is no need to provide a filename for the temporary file because it will be destroyed after the script is finished running.

After writing to the file, you can read from it and close it when you’re done processing. Once the file is closed, it is deleted from the file system. If you need to use tempfile as expected to generate temporary files, named. Please use the tempfile as expected NamedTemporaryFile ().

Temporary files and directories created with tempFile are stored in a special system directory for storing temporary files. Python will search the directory list for a directory in which the user can create a file.

On Windows, the directories are in the order C:\TEMP, C:\TMP, \TEMP and \TMP. On all other platforms, the directories are/TMP, /var/tmp, and /usr/tmp in that order. If none is present, tempFile stores temporary files and directories in the current directory.

.TemporaryFile() is also a context manager, so it can be used with the with statement. Using the context manager automatically closes and deletes files after reading them:

with TemporaryFile('w+t') as fp:
    fp.write('Hello universe! ')
    fp.seek(0)
    fp.read()
The temporary file has now been closed and deleted
Copy the code

This creates a temporary file and reads data from it. Once the contents of the file are read, the temporary file is closed and deleted from the file system.

Tempfiles can also be used to create temporary directories. Let’s look at how to use the tempfile as expected. TemporaryDirectory () to do this:

import tempfile
import os

tmp = ' '
with tempfile.TemporaryDirectory() as tmpdir:
    print('Created temporary directory ', tmpdir)
    tmp = tmpdir
    print(os.path.exists(tmpdir))

print(tmp)
print(os.path.exists(tmp))
Copy the code

Call tempfile as expected. TemporaryDirectory () will create a temporary directory in a file system, and returns a said the directory object. In the above example, the context manager is used to create a directory whose name is stored in the tmpdir variable. The third line prints out the name of the temporary directory, os.path.exists(tmpdir), to verify that the directory is actually created on the file system.

After the context manager exits the context, the temporary directory is deleted, and calls to OS.path.exists (tmpdir) return False, which means that the directory was successfully deleted.

Delete files and directories

You can delete individual files, directories, and entire directory trees using methods in the OS, Shutil, and Pathlib modules. Here’s how to delete files and directories you no longer need.

Delete files in Python

To remove individual files, use pathlib.path.unlink (), os.remove(), or os.unlink().

Os.remove () and os.unlink() are semantically identical. To remove a file using os.remove(), do the following:

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.remove(data_file)
Copy the code

Removing files using os.unlink() is similar to using os.remove() :

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.unlink(data_file)
Copy the code

Calling.unlink() or.remove() on a file removes the file from the file system. Both functions raise OSError if the path passed to them points to a directory rather than a file. To avoid this, check to see if the content you are deleting is a file and delete it when it is, or use exception handling to handle OSError:

import os

data_file = 'home/data.txt'
Delete if the type is file
if os.path.is_file(data_file):
    os.remove(data_file)
else:
    print(f'Error: {data_file} not a valid filename')
Copy the code

Os.path.is_file () checks whether data_file is actually a file. If so, it is removed by calling os.remove(). If data_file points to the folder, an error message is output to the console.

The following example shows how to use exception handling to handle errors when deleting files:

import os

data_file = 'home/data.txt'
# Use exception handling
try:
    os.remove(data_file)
except OSError as e:
    print(f'Error: {data_file} : {e.strerror}')
Copy the code

The code above attempts to delete the file before checking its type. If data_file is not actually a file, the raised OSError is handled in the except clause and an error message is printed to the console. The printed error message is formatted using Python F-strings.

Finally, you can also delete files using pathlib.path.unlink () :

from pathlib import Path

data_file = Path('home/data.txt')
try:
    data_file.unlink()
except IsADirectoryError as e:
    print(f'Error: {data_file} : {e.strerror}')
Copy the code

This creates a Path object named data_file that points to a file. Calling.unlink () on data_file will remove home/data.txt. If the data_file points to a directory, IsADirectoryError is raised. It is worth noting that the Python program above has the same permissions as the user running it. If the user does not have permission to delete the file, a PermissionError is raised.

Delete the directory

The library provides the following functions to delete directories:

os.rmdir()
pathlib.Path.rmdir()
shutil.rmtree()

To remove a single directory or folder, use os.rmdir() or pathlib.path.rmdir (). These functions only work if you delete an empty directory. If the directory is not empty, OSError is raised. Here’s how to delete a folder:

import os

trash_dir = 'my_documents/bad_dir'

try:
    os.rmdir(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy the code

Trash_dir has now been removed via os.rmdir(). If the directory is not empty, an error message will be printed on the screen:

Traceback (most recent call last):
  File '<stdin>', line 1, in <module>
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'
Copy the code

Similarly, you can use pathlib to delete directories:

from pathlib import Path

trash_dir = Path('my_documents/bad_dir')

try:
    trash_dir.rmdir()
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy the code

A Path object is created to point to the directory to be deleted. If the directory is empty, call the.rmdir() method of the Path object to remove it.

Delete the entire directory tree

To remove non-empty directories and full directory trees, Python provides shutil.rmtree() :

import shutil

trash_dir = 'my_documents/bad_dir'

try:
    shutil.rmtree(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy the code

When shutil.rmtree() is called, everything in trash_DIR is deleted. In some cases, you may want to recursively delete empty folders. You can do this using one of the methods discussed above in conjunction with os.walk() :

import os

for dirpath, dirnames, files in os.walk('. ', topdown=False) :try:
        os.rmdir(dirpath)
    except OSError as ex:
        pass
Copy the code

This will traverse the directory tree and try to delete each directory it finds. If the directory is not empty, raise OSError and skip the directory. The following table lists the features covered in this section:

function	describe
os.remove()	Delete a single file, but not a directory
os.unlink()	As with os.remove(), the function removes individual files
pathlib.Path.unlink()	Delete a single file, but not a directory
os.rmdir()	Delete an empty directory
pathlib.Path.rmdir()	Delete an empty directory
shutil.rmtree()	Delete the entire directory tree, which can be used to delete non-empty directories

Copy, move, and rename files and directories

Python ships with the shutil module. Shutil is short for shell utility. It provides many advanced operations for files to support copying, archiving, and deleting files and directories. In this section, you will learn how to move and copy files and directories.

Copy the file

Shutil provides several functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2(). To copy files from one location to another using shutil.copy(), do the following:

import shutil

src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy(src, dst)
Copy the code

Shutil.copy () is equivalent to the cp command on Unix-based systems. Shutil. copy(SRC, DST) copies the file SRC to the location specified in DST. If DST is a file, the contents of that file are replaced with the contents of SRC. If DST is a directory, SRC will be copied to that directory. Shutil.copy () copies only the contents and permissions of the file. Other metadata, such as file creation and modification times, is not retained.

To preserve all file metadata when copying, use shutil.copy2() :

import shutil

src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy2(src, dst)
Copy the code

Use.copy2() to retain details about the file, such as the last access time, permission bits, last modification time, and flags.

Copy directory

Although shutil.copy() copies only a single file, shutil.copytree() copies the entire directory and everything contained in it. Shutil.copytree (SRC, dest) takes two arguments: the source directory and the destination directory to which files and folders are copied.

Here is an example of how to copy the contents of a folder to another location:

import shutil
dst = shutil.copytree('data_1'.'data1_backup')
print(dst)  # data1_backup
Copy the code

In this example,.copyTree () copies the contents of data_1 to a new location, data1_backup, and returns to the target directory. The destination directory cannot already exist. It will be created without its parent directory. Shutil.copytree () is a good way to back up files.

Move files and directories

To move a file or directory to another location, use shutil.move(SRC, DST).

SRC is the file or directory to move, DST is the destination:

import shutil
dst = shutil.move('dir_1/'.'backup/')
print(dst)  # 'backup'
Copy the code

If backup/ exists, shutil.move(‘dir_1/’, ‘backup/’) moves dir_1/ to backup/. If backup/ does not exist, dir_1/ will be renamed backup.

Rename files and directories

Python contains os.rename(SRC, DST) for renaming files and directories:

import os
os.rename('first.zip'.'first_01.zip')
Copy the code

The above line first.zip will be renamed first_01.zip. OSError is raised if the target path points to a directory.

Another way to rename a file or directory is to use rename () in the pathlib module:

from pathlib import Path
data_file = Path('data_01.txt')
data_file.rename('data.txt')
Copy the code

To rename a file using pathlib, you first create a pathlib.path () object that contains the Path to the file you want to replace. The next step is to call rename() on the path object and pass in the new name of the file or directory you want to rename.

The archive

Archiving is a convenient way to package multiple files into a single file. The two most common archive types are ZIP and TAR. You write Python programs that create archive files, read archive files, and extract data from archive files. In this section you will learn how to read and write both compression formats.

Reading ZIP files

The Zipfile module is an underlying module that is part of the Python standard library. Zipfile has functions that make it easy to open and extract ZIP files. To read the contents of the ZIP file, the first thing you do is create a ZipFile object. A ZipFile object is similar to a file object created using open(). ZipFile is also a context manager, so it supports the with statement:

import zipfile

with zipfile.ZipFile('data.zip'.'r') as zipobj:
    pass
Copy the code

Here we create a ZipFile object, pass in the name of the ZIP file and open it in read mode. Once the ZIP file is opened, information about the archive file can be accessed through the functions provided by the ZipFile module. The data.zip archive in the above example is created from a directory named Data, which contains a total of five files and one subdirectory:

. | ├ ─ ─ sub_dir / | ├ ─ ─ bar. The p y | └ ─ ─ foo py | ├ ─ ─ file1. Py ├ ─ ─ file2. Py └ ─ ─ file3. PyCopy the code

To get a list of files in an archive file, call namelist() on the ZipFile object:

import zipfile

with zipfile.ZipFile('data.zip'.'r') as zipobj:
    zipobj.namelist()
Copy the code

This generates a list of files:

['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
Copy the code

.namelist() returns a list of names of files and directories in the archive file. To retrieve information about a file in an archive file, use.getInfo () :

import zipfile

with zipfile.ZipFile('data.zip'.'r') as zipobj:
    bar_info = zipobj.getinfo('sub_dir/bar.py')
    print(bar_info.file_size)
Copy the code

This will print:

15277
Copy the code

.getInfo () returns a ZipInfo object that stores information about a single member of the archive file. To get information about the file in the archive file, pass its path as a parameter to.getInfo (). Using getInfo (), you can retrieve information about archive file members, such as the date the file was last modified, compressed size, and its full filename. Accessing.file_size retrieves the original size of the file in bytes.

The following example shows how to retrieve more detailed information about archived files in the Python REPL. Assuming the zipfile module has been imported, bar_info is the same object created in the previous example:

>>> bar_info.date_time
(2018.10.7.23.30.10)
>>> bar_info.compress_size
2856
>>> bar_info.filename
'sub_dir/bar.py'
Copy the code

Bar_info contains details about bar.py, such as the size of the compression and its full path.

The first line shows how to retrieve the last modification date of the file. The next line shows how to get the size of the file after archiving. The last line shows the full path to bar.py in the archive file.

ZipFile supports the context manager protocol, which is why you can use it with the with statement. When this is done, the ZipFile object is automatically closed. Attempting to open or extract a file from a closed ZipFile object results in an error.

Extracting ZIP files

The zipFile module allows you to extract one or more files from ZIP files via.extract() and.Extractall ().

By default, these methods extract files to the current directory. They all take an optional path parameter that allows you to specify another specified directory to extract the file to. If the directory does not exist, it is automatically created. To extract a file from a compressed file, do the following:

>>> import zipfile
>>> import os

>>> os.listdir('. ')
['data.zip']

>>> data_zip = zipfile.ZipFile('data.zip'.'r')

>>> Extract a single file to current directory
>>> data_zip.extract('file1.py')
'/home/test/dir1/zip_extract/file1.py'

>>> os.listdir('. ')
['file1.py'.'data.zip']

>>> Carry all files to the specified directory
>>> data_zip.extractall(path='extract_dir/')

>>> os.listdir('. ')
['file1.py'.'extract_dir'.'data.zip']

>>> os.listdir('extract_dir')
['file1.py'.'file3.py'.'file2.py'.'sub_dir']

>>> data_zip.close()
Copy the code

The third line of code is a call to os.listdir(), which shows that the current directory has only one file data.zip.

Next, open data.zip in read mode and call.extract() to extract file1.py from it. .extract() returns the full file path of the extracted file. Since no path is specified,.extract() extracts file1.py to the current directory.

The next line prints a directory list showing that the current directory now contains archive files in addition to the original archive files. It then shows how to extract the entire archive into the specified directory. .Extractall () creates extract_dir and extracts the contents of data.zip into it. The last line closes the ZIP archive file.

Extract data from an encrypted document

Zipfile supports extraction of password-protected ZIP files. To extracta password-protected ZIP file, pass the password as an argument to the.extract() or.Extractall () method:

>>> import zipfile

>>> with zipfile.ZipFile('secret.zip'.'r') as pwd_zip:
.    Extract data from an encrypted document
.    pwd_zip.extractall(path='extract_dir', pwd='Quish3@o')
Copy the code

The secret.zip archive will be opened in read mode. The password is supplied to.extractall(), and the compressed file content is extracted to the extract_dir. Because of the with statement, the archive file is automatically closed after the extraction is complete.

Create a new archive file

To create a new ZIP archive, open the ZipFile object in write mode (W) and add the file to be archived:

>>> import zipfile

>>> file_list = ['file1.py'.'sub_dir/'.'sub_dir/bar.py'.'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip'.'w') as new_zip:
.    for name in file_list:
.        new_zip.write(name)
Copy the code

In this example, new_zip opens in write mode, and each file in file_list is added to the archive file. After the with statement ends, new_zip is closed. Opening the ZIP file in write mode deletes the contents of the compressed file and creates a new archive file.

To add a file to an existing archive file, open the ZipFile object in append mode and add the file:

>>> with zipfile.ZipFile('new.zip'.'a') as new_zip:
.    new_zip.write('data.txt')
.    new_zip.write('latin.txt')
Copy the code

This opens the new.zip archive that was created in append mode in the previous example. Opening the ZipFile object in append mode allows you to add a new file to the ZIP file without deleting its current contents. After you add a file to a ZIP file, the with statement takes the file out of context and closes it.

Open the TAR archive file

A TAR file is an uncompressed archive of files such as ZIP. They can be compressed using gzip, bzip2, and LZMA compression methods. The TarFile class allows the TAR archive to be read and written.

The following is read from the archive:

import tarfile

with tarfile.open('example.tar'.'r') as tar_file:
    print(tar_file.getnames())
Copy the code

The tarFile object opens like most file-like objects. They have an open() function that takes a mode to determine how files should be opened.

Use “R”, “W”, or “A” modes to open the uncompressed TAR file for reading, writing, and appending, respectively. To open the compressed TAR file, pass the mode argument to tarfile.open() in the format filemode [:compression]. The following table lists the possible modes in which TAR files can be opened:

model	behavior
r	Open the archive in uncompressed read mode
r:gz	Open the archive in gzip compressed read mode
r:bz2	Open the archive in bzip2 compressed read mode
w	Opens the archive in uncompressed write mode
w:gz	Open the archive in gzip compressed write mode
w:xz	Open the archive in lZMA compressed write mode
a	Open the archive in uncompressed append mode

.open() defaults to ‘r’ mode. To read an uncompressed TAR file and retrieve the filename in it, use.getnames() :

>>> import tarfile

>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst'.'README.md'.'app.py']
Copy the code

This returns the names of the contents in the archive as a list.

Note: To show you how to use the different tarfile object methods, the TAR file in the example is opened and closed manually in an interactive REPL session.

By interacting with the TAR file in this way, you can see the output of running each command. In general, you might want to use a context manager to open file-like objects.

In addition, metadata for each entry in the archive can be accessed using special attributes:

>>> for entry in tar.getmembers():
.    print(entry.name)
.    print(' Modified:', time.ctime(entry.mtime))
.    print(' Size :', entry.size, 'bytes')
.    print()
CONTRIBUTING.rst
 Modified: Sat Nov  1 09:09:51 2018
 Size    : 402 bytes

README.md
 Modified: Sat Nov  3 07:29:40 2018
 Size    : 5426 bytes

app.py
 Modified: Sat Nov  3 07:29:13 2018
 Size    : 6218 bytes
Copy the code

In this example, loop over the list of files returned by.getmembers() and print out the attributes for each file. .getMembers () returns objects with programmatically accessible properties, such as the name of each file in the archive, its size, and when it was last modified. After an archive is read or written, it must be closed to free system resources.

Extract files from the TAR archive

In this section, you will learn how to extract files from TAR archives using the following methods:

.extract()
.extractfile()
.extractall()

To extract a single file from the TAR archive, use extract(), passing in the filename:

>>> tar.extract('README.md')
>>> os.listdir('. ')
['README.md'.'example.tar']
Copy the code

The readme. md file is extracted from the archive to the file system. Call os.listdir() to verify that the readme.md file has been successfully extracted into the current directory. To extract or extract everything from the archive, use.Extractall () :

>>> tar.extractall(path="extracted/")
Copy the code

.Extractall () has an optional path parameter to specify where to unzip the file. Here, the archive is extracted into the extracted directory. The following command shows that the archive was successfully extracted:

$ ls
example.tar  extracted  README.md

$ tree. ├ ─ ─ example. Tar ├ ─ ─ extracted | ├ ─ ─ app. Py | ├ ─ ─ CONTRIBUTING. RST | └ ─ ─ the README. Md └ ─ ─ the README. 1 md directory, 5 files
$ ls extracted/
app.py  CONTRIBUTING.rst  README.md
Copy the code

To extract a file object for reading or writing, use.ExtractFile (), which takes a filename or TarInfo object as an argument. .extractFile () returns a file-like object that can be read and used:

>>> f = tar.extractfile('app.py')
>>> f.read()
>>> tar.close()
Copy the code

Open archives should always be closed after reading or writing. To close the archive, call.close() on the archive file handle or use the with statement when creating the tarFile object to automatically close the archive when it’s done. This frees up system resources and writes any changes you make to the archive to the file system.

Create a new TAR archive

To create a new TAR archive, you can do this:

>>> import tarfile

>>> file_list = ['app.py'.'config.py'.'CONTRIBUTORS.md'.'tests.py']
>>> with tarfile.open('packages.tar', mode='w') as tar:
.    for file in file_list:
.        tar.add(file)

>>> # Read the contents of the newly created archive
>>> with tarfile.open('package.tar', mode='r') as t:
.    for member in t.getmembers():
.        print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
Copy the code

First, you create a list of files to add to the archive so you don’t have to manually add each file.

The next line opens a new archive called Packes.tar in write mode using the With raytext manager. Opening the archive in write mode (‘ W ‘) allows you to write new files to the archive. All existing files in the archive are deleted and a new archive is created.

Once the archive is created and populated, the With context manager automatically closes it and saves it to the file system. The last three lines open the archive you just created and print out the names of the files it contains.

To add a new file to an existing archive, open the archive in Append mode (‘a’) :

>>> with tarfile.open('package.tar', mode='a') as tar:
.    tar.add('foo.bar')

>>> with tarfile.open('package.tar', mode='r') as tar:
.    for member in tar.getmembers():
.        print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar
Copy the code

Opening an archive in Append mode allows you to add new files to it without deleting existing ones.

Use compressed archive

Tarfile can read and write to TAR archive files compressed using Gzip, bzip2, and LZMA. To read or write to the compressed archive, use tarfile.open(), passing the appropriate schema for the compressed type.

For example, to read or write data from a Gzzip-compressed TAR archive, use ‘R :gz’ or’ W :gz’ modes respectively:

>>> files = ['app.py'.'config.py'.'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
.    tar.add('app.py')
.    tar.add('config.py')
.    tar.add('tests.py')

>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
.    for member in t.getmembers():
.        print(member.name)
app.py
config.py
tests.py
Copy the code

‘w:gz’ opens the gzip-compressed archive in write mode and ‘r:gz’ opens the gzip-compressed archive in read mode. Unable to open a compressed archive in append mode. To add a file to a compressed archive, you must create a new archive.

An easier way to create an archive

The Python standard library also supports the creation of TAR and ZIP archives using advanced methods in the Shutil module. The archive utility in Shutil allows you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower-level tarfile and Zipfile modules.

useshutil.make_archive()To create an archive

Shutil.make_archive () takes at least two arguments: the name of the archive and the archive format.

By default, it compresses all files in the current directory into the archive format specified in the format parameter. You can pass in the optional root_dir argument to compress files in different directories. .make_archive() supports zip, tar, bztar, and gztar archive formats.

Here’s how to create a TAR archive using Shutil:

import shutil

# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup'.'tar'.'data/')
Copy the code

This copies everything in data/and creates an archive named backup.tar on the file system and returns its name. To extract the archive, call.unpack_archive() :

shutil.unpack_archive('backup.tar'.'extract_dir/')
Copy the code

Call.unpack_archive() and pass in the archive name and destination directory to extract the contents of backup.tar into extract_dir/. ZIP archives can be created and extracted in the same way.

Read multiple files

Python supports reading data from multiple input streams or lists of files through the FileInput module. This module allows you to loop through the contents of one or more text files quickly and easily. Here’s a typical way to use FileInput:

import fileinput
for line in fileinput.input()
    process(line)
Copy the code

Fileinput takes its input by default from the command line argument passed to sys.argv.

usefileinputLoop through multiple files

Let’s use FileInput to build a primitive version of cat, a common UNIX tool. The CAT tool reads the files sequentially, writing them to standard output. When multiple files are given in command line arguments, CAT concatenates the text files and displays the results in the terminal:

# File: fileinput-example.py
import fileinput
import sys

files = fileinput.input()
for line in files:
    if fileinput.isfirstline():
        print(f'\n--- Reading {fileinput.filename()}- ')
    print('- >' + line, end=' ')
print()
Copy the code

There are two text files in the current directory. Running this command produces the following output:

$ python3 fileinput-example.py bacon.txt cupcake.txt
--- Reading bacon.txt ---
 -> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip,
 -> irure cillum drumstick elit.
 -> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta.
 -> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine.
 -> Tri-tip doner kevin cillum ham veniam cow hamburger.
 -> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in.
 -> Ball tip dolor do magna laboris nisi pancetta nostrud doner.

--- Reading cupcake.txt ---
 -> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake.
 -> Topping muffin cotton candy.
 -> Gummies macaroon jujubes jelly beans marzipan.
Copy the code

Fileinput allows you to retrieve more information about each line, such as whether it is the firstline(.isfirstline()), line number (.lineno()), and filename(.filename()). You can read more about it here.

conclusion

You now know how to use Python to perform the most common operations on files and file groups. You have learned to use different built-in modules to read, find, and manipulate files.

You can now do this in Python:

Gets directory content and file properties
Create directories and directory trees
Use match pattern to match file names
Create temporary files and directories
Move, rename, copy, and delete files or directories
Read and extract data from different types of archive files
usefileinputRead multiple files at the same time

Pay close attention to the public number < code and art >, learn more foreign high-quality technical articles.

Python file manipulation, this is enough