Working with- files-in-Python
Python has several built-in modules and methods for handling files. These methods are split into modules such as OS, os.path, shutil and pathlib. This article will list the most common operations and methods on files in Python.
In this article you will learn how to:
- Get file attributes
- Create a directory
- File name pattern matching
- Traverse the directory tree
- Create temporary files and directories
- Delete files and directories
- Copy, move, and rename files and directories
- Create and unzip ZIP and TAR files
- use
fileinput
The module opens multiple files
Read and write file data in Python
Using Python to read and write files is straightforward. To do this, you must first open the file in the appropriate mode. Here is an example of how to open a text file and read its contents.
with open('data.txt'.'r') as f:
data = f.read()
print('context: {}'.format(data))
Copy the code
Open () takes a file name and a mode as its arguments, and r means to open the file in read-only mode. If you want to write data to a file, use w as the argument.
with open('data.txt'.'w') as f:
data = 'some data to be written to the file'
f.write(data)
Copy the code
In the above example, open() opens a file for reading or writing and returns a file handle (f in this case) that provides methods that can be used to read or write file data. Read Working With File I/O in Python for more information on how to read and write files.
Get directory list
Suppose your current working directory has a subdirectory called my_directory that contains the following contents:
. ├ ─ ─ file1. Py ├ ─ ─ file2. CSV ├ ─ ─ file3. TXT ├ ─ ─ sub_dir │ ├ ─ ─ bar. Py │ └ ─ ─ foo py ├ ─ ─ sub_dir_b │ └ ─ ─ file4. TXT └ ─ ─ ├─ config.py ├─ file5.txtCopy the code
Python’s built-in OS module has many useful methods for listing directory contents and filtering results. To get a list of all files and folders for a particular directory in a file system, you can use os.listdir() in legacy Python or os.scandir() in Python 3.x. If you also want to get file and directory properties (such as file size and modification date), os.scandir() is the preferred method.
Get the directory list using the legacy version of Python
import os
entries = os.listdir('my_directory')
Copy the code
Os.listdir () returns a Python list containing the names of files and subdirectories of the directory to which the path argument points.
['file1.py'.'file2.csv'.'file3.txt'.'sub_dir'.'sub_dir_b'.'sub_dir_c']
Copy the code
The directory list now looks hard to read, and circular printing of the results of the call to os.listdir() helps.
for entry in entries:
print(entry)
"""
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
"""
Copy the code
Get the directory list using a modern version of Python
In modern Python versions, os.scandir() and pathlib.path can be used instead of os.listdir().
Os.scandir () is referenced in Python 3.5, documented as PEP 471.
Os.scandir () returns an iterator instead of a list.
import os
entries = os.scandir('my_directory')
print(entries)
# <posix.ScandirIterator at 0x105b4d4b0>
Copy the code
ScandirIterator points to all entries in the current directory. You can iterate over the contents of the iterator and print the file name.
import os
with os.scandir('my_directory') as entries:
for entry in entries:
print(entry.name)
Copy the code
Os.scandir () is used here with the with statement because it supports the context management protocol. Use the context manager to close the iterator and automatically release the acquired resources when the iterator is exhausted. The result of printing the filename in my_directory is the same as that seen in the os.listdir() example:
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy the code
Another way to get a directory list is to use the pathlib module:
from pathlib import Path
entries = Path('my_directory')
for entry in entries.iterdir():
print(entry.name)
Copy the code
Pathlib.path () returns either PosixPath or WindowsPath objects, depending on the operating system.
The pathlib.path () object has a.iterdir() method that creates an iterator containing all the files and directories in that directory. Each entry generated by.iterdir() contains information about a file or directory, such as its name and file attributes. Pathlib was first introduced in Python3.4 and is a nice enhancement to Python that provides an object-oriented interface to the file system.
In the example above, you call pathlib.path () and pass in a Path argument. Then call.iterdir() to get a list of all the files and directories under my_directory.
Pathlib provides a set of classes that provide most common operations on paths in a simple and object-oriented manner. Using Pathlib is more efficient than using OS functions. Another advantage of using Pathlib compared to OS is the reduction in the number of packages or modules imported to operate the file system path. For more information, read Python 3’s Pathlib Module: Taming the File System.
Running the code above yields the following result:
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy the code
Using pathlib.path () or os.scandir() instead of os.listdir() is the preferred way to get directory lists, especially if you need to get information about file types and file properties. Pathlib.path () provides most of the functionality for handling files and paths in OS and Shutil, and its approach is more efficient than these modules. We will discuss how to get file attributes quickly.
function | describe |
---|---|
os.listdir() | Returns all files and folders in the directory as a list |
os.scandir() | Returns an iterator containing all objects in a directory that contains file property information |
pathlib.Path().iterdir() | Returns an iterator containing all objects in a directory that contains file property information |
These functions return a list of all the contents of a directory, including subdirectories. This may not always be what you want, and the next section shows you how to filter the results from the directory list.
Lists all files in the directory
This section shows you how to print out the names of files in a directory using os.listdir(), os.scandir(), and pathlib.path (). To filter directories and list only files that have a list of directories generated by os.listdir(), use os.path:
import os
basepath = 'my_directory'
for entry in os.listdir(basepath):
# check whether the path is a file type using os.path.isfile
if os.path.isfile(os.path.join(base_path, entry)):
print(entry)
Copy the code
Os.listdir () is called here to return a list of everything in the specified path, and then os.path.isfile() is used to filter the list to show only file types instead of directory types. The code execution result is as follows:
file1.py
file2.csv
file3.txt
Copy the code
An easier way to list all the files in a directory is to use os.scandir() or pathlib.path () :
import os
basepath = 'my_directory'
with os.scandir(basepath) as entries:
for entry in entries:
if entry.is_file():
print(entry.name)
Copy the code
Using os.scandir() is much clearer and easier to understand than using os.listdir(). Entry.isfile () is called for each entry of ScandirIterator, which returns True to indicate that the entry is a file. The output of the above code is as follows:
file1.py
file3.txt
file2.csv
Copy the code
Next, show how to use pathlib.path () to list files in a directory:
from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
if entry.is_file():
print(entry.name)
Copy the code
.is_file() is called on each item produced by.iterdir(). The output is the same as above:
file1.py
file3.txt
file2.csv
Copy the code
The code above can be more concise if you combine the for loop and if statements into a single generator expression. Dan Bader’s article on generator expressions is recommended.
The updated version is as follows:
from pathlib import Path
basepath = Path('my_directory')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
print(item.name)
Copy the code
The result of the above code execution is the same as before. This section shows that using os.scandir() and pathlib.path () to filter files or directories is more intuitive and the code looks cleaner than using os.listdir() and os.path.
Listing subdirectories
If you want to list subdirectories instead of files, use the following method. Now show how to use os.listdir() and os.path() :
import os
basepath = 'my_directory'
for entry in os.listdir(basepath):
if os.path.isdir(os.path.join(basepath, entry)):
print(entry)
Copy the code
When you call os.path,join() multiple times, manipulating the file system this way becomes cumbersome. Running this code on my computer produces the following output:
sub_dir
sub_dir_b
sub_dir_c
Copy the code
Here’s how to use os.scandir() :
import os
basepath = 'my_directory'
with os.scandir(basepath) as entries:
for entry in entries:
if entry.is_dir():
print(entry.name)
Copy the code
As in the example in the list of files,.is_dir() is called here on each item returned by os.scandir(). If this is a directory, is_dir() returns True and prints out the name of the directory. The output is the same as above:
sub_dir_c
sub_dir_b
sub_dir
Copy the code
Here’s how to use pathlib.path () :
from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
if entry.is_dir():
print(entry.name)
Copy the code
Call is_dir() on each item returned by the.iterdir() iterator to check whether it is a file or directory. If the item is a directory, its name is printed and the output is the same as in the previous example:
sub_dir_c
sub_dir_b
sub_dir
Copy the code
Get file attributes
Python makes it easy to get file properties such as file size and modification time. This can be obtained by using os.stat(), os.scandir(), or pathlib.path.
Os.scandir () and pathlib.path () directly get a list of directories containing file attributes. This may be more efficient than listing the files using os.listdir() and then getting the file attribute information for each file.
The following example shows how to get the last modification time of a file in my_directory. Output as a timestamp:
import os
with os.scandir('my_directory') as entries:
for entry in entries:
info = entry.stat()
print(info.st_mtime)
"" 1548163662.3952665 1548163689.1982062 1548163697.9175904 1548163721.1841028 1548163740.765162 1548163769.4702623 ""
Copy the code
Os.scandir () returns a ScandirIterator object. Each item in a ScandirIterator has a.stat() method to get information about the file or directory it points to. .stat() provides information such as file size and last modification time. In the example above, the code prints the ST_time property, which is the last time the contents of the file were modified.
The pathlib module has methods to get file information with the same result:
from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
info = entry.stat()
print(info.st_mtime)
"" 1548163662.3952665 1548163689.1982062 1548163697.9175904 1548163721.1841028 1548163740.765162 1548163769.4702623 ""
Copy the code
In the above example, the iterators returned by the.iterdir() loop and the file attributes are obtained by calling.stat() on each of them. The st_mtime attribute is a floating-point value that represents a timestamp. To make the value returned by st_time easier to read, you can write a helper function to convert it to a datetime object:
import datetime from pathlib import Path def timestamp2datetime(timestamp, convert_to_local=True, utc=8, Is_remove_ms =True) """ convert UNIX timestamp to datetime object :param timestamp: timestamp: param convert_to_local: whether to convert to local time :param utc: Time zone information: UTC +8 :param is_remove_ms: whether to remove ms: return: datetime object """ if is_remove_ms: timestamp = int(timestamp) dt = datetime.datetime.utcfromtimestamp(timestamp) if convert_to_local: dt = dt + datetime.timedelta(hours=utc) return dt def convert_date(timestamp, format='%Y-%m-%d %H:%M:%S'): dt = timestamp2datetime(timestamp) return dt.strftime(format) basepath = Path('my_directory') for entry in basepath.iterdir(): If entry.is_file() info = entry.stat() print('{} last time changed to {}'. Format (entry.name, timestamp2dateTime (info.st_mtime)))Copy the code
Get the list of files in my_directory and their attributes, then call convert_date() to convert the file and finally make the modification time display in a human-readable way. Convert_date () converts the datetime type to a string using.strftime().
Output from the above code:
TXT last modified 2019-01-24 09:04:39 file2. CSV last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39 file3. TXT last modified 2019-01-24 09:04:39 file1. CSV last modified 2019-01-24 09:04:39 file1.py last modified 2019-01-24 09:04:39Copy the code
The syntax for converting dates and times to strings can be confusing. For more information, please consult the relevant official documentation. Another way is to read strftime.org.
Create a directory
Sooner or later, your program will need to create directories in which to store data. OS and pathlib contain functions to create directories. We will consider the following methods:
methods | describe |
---|---|
os.mkdir() | Create a single subdirectory |
os.makedirs() | Create multiple directories, including intermediate directories |
Pathlib.Path.mkdir() | Create a single or multiple directories |
Creating a single directory
To create a single directory, pass the directory path as an argument to os.mkdir() :
import os
os.mkdir('example_directory')
Copy the code
If the directory already exists, os.mkdir() raises an FileExistsError exception. Alternatively, you can use pathlib to create directories:
from pathlib import Path
p = Path('example_directory')
p.mkdir()
Copy the code
If the path already exists, mkdir() raises FileExistsError:
FileExistsError: [Errno 17] File exists: 'example_directory'
Copy the code
To avoid error throws like this, catch errors when they occur and let your users know:
from pathlib import Path
p = Path('example_directory')
try:
p.mkdir()
except FileExistsError as e:
print(e)
Copy the code
Alternatively, you can ignore FileExistsError exceptions by passing exist_OK =True to.mkdir() :
from pathlib import Path
p = Path('example_directory')
p.mkdir(exist_ok=True)
Copy the code
If the directory already exists, no error is raised.
Create multiple directories
Os.makedirs () is similar to os.mkdir(). The difference between the two is that os.makedirs() can not only create individual directories, but also recursively create directory trees. In other words, it can create any intermediate folder necessary to ensure that the full path exists.
Os.makedirs () is similar to running mkdir -p in bash. For example, to create a set of directories like 2018/10/05, you could do something like this:
import os
os.makedirs('2018/10/05', mode=0o770)
Copy the code
The code above creates the directory structure for 2018/10/05 and provides read, write, and execute permissions to the owner and group users. The default mode is 0O777, which adds the permissions of other user groups. Refer to the documentation for more details on file permissions and how schemas are applied.
Run the tree command to confirm the permissions of our application:
$ tree -p -i .
.
[drwxrwx---] 2018
[drwxrwx---] 10
[drwxrwx---] 05
Copy the code
The code above prints out the directory tree for the current directory. Tree is usually used to list the contents of a directory in a tree structure. Passing the -p and -i arguments will print the directory name and its file permissions in a vertical list. -p is used to output file permissions, and -i is used to cause the tree command to produce a vertical list without indentation.
As you can see, all directories have 770 permissions. Another way to create multiple directories is to use.mkdir() of pathlib.path:
from pathlib import Path
p = Path('2018/10/05')
p.mkdir(parents=True, exist_ok=True)
Copy the code
Make path.mkdir () create the 05 directory and any parent directories whose paths are valid by passing the parents=True keyword argument to it.
By default, os.makedirs() and pathlib.path.mkdir () raise OSError if the target directory exists. This behavior can be overridden (starting with Python3.2) by passing exist_OK =True as the keyword argument each time a function is called.
Running the above code results in something like the following:
├ ─ 08.08.02Copy the code
I prefer to use pathlib when creating directories, because I can use the same function method to create one or more directories.
File name pattern matching
After using one of the above methods to get a list of files in a directory, you might want to search for files that match a particular pattern.
Here are some methods and functions you can use:
endswith()
和startswith()
String methodfnmatch.fnmatch()
glob.glob()
pathlib.Path.glob()
These methods and functions are discussed below. The examples in this section will be executed under a directory named some_directory that has the following structure:
. ├ ─ ─ admin. Py ├ ─ ─ data_01_backup. TXT ├ ─ ─ data_01. TXT ├ ─ ─ data_02_backup. TXT ├ ─ ─ data_02. TXT ├ ─ ─ data_03_backup. TXT ├ ─ ─ Data_03. TXT ├ ─ ─ sub_dir │ ├ ─ ─ file1. Py │ └ ─ ─ file2. Py └ ─ ─ tests. PyCopy the code
If you are using Bash shell, you can create the above directory structure with the following command:
mkdir some_directory cd some_directory mkdir sub_dir touch sub_dir/file1.py sub_dir/file2.py touch data_{01.. 03}.txt data_{01.. 03}_backup.txt admin.py tests.pyCopy the code
This will create the some_directory and enter it, followed by creating sub_dir. The next line creates file1.py and file2.py in sub_dir, and the last line uses the extension to create all other files. To learn more about shell extensions, read here.
Using string methods
Python has several built-in methods for modifying and manipulating strings. Two of these methods.startswith() and.endswith() are very useful when matching file names. To do this, first get a list of directories and then iterate over it.
import os
for f_name in os.listdir('some_directory') :if f_name.endswith('.txt'):
print(f_name)
Copy the code
The code above finds all the files in some_directory, traverses and prints all the file names with the.endswith() extension. The running code on my computer output is as follows:
data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy the code
usefnmatch
Simple file name pattern matching
The ability of string methods to match is limited. Fnmatch has more advanced functions and methods for pattern matching. We’ll consider using fnmate.fnmatch (), which is a support for using * and? And so on. For example, using fnmatch to find all.txt files in a directory, you can do this:
import os
import fnmatch
for f_name in os.listdir('some_directory') :if fnmatch.fnmatch(f_name, '*.txt'):
print(f_name)
Copy the code
Iterate over the list of files in some_directory and use.fnmatch() to perform a wildcard search for files with the.txt extension.
More advanced pattern matching
Suppose you want to find a.txt file that matches a particular drop. For example, you might point to a.txt file that contains single-pass data, a set of numbers between underscores, and a file name that contains the word backup. Similar to data_01_backup, data_02_backup, or data_03_backup.
You can use fnmate.fnmatch () like this:
import os
import fnmatch
for f_name in os.listdir('some_directory') :if fnmatch.fnmatch(f_name, 'data_*_backup.txt'):
print(f_name)
Copy the code
Here, just print out the name of the file that matches the data_*_backup.txt schema. The * in the pattern will match any character, so running this code looks for all text files whose names start with data and start with backup.txt, as shown in the following output:
data_01_backup.txt
data_02_backup.txt
data_03_backup.txt
Copy the code
useglob
Perform file name pattern matching
Another useful pattern matching module is the Glob.
.glob() is left and right in the glob module just like fnmate.fnmatch (), but unlike fnMach.fnmatch (), it will start with. Files at the beginning are considered special files.
Do UNIX and related systems use wildcard images in file lists? And * indicate full match.
For example, in a UNIX shell use mv *.py python_files to move all files with the.py extension from the current directory to python_files. This * is a wildcard representing any number of characters, and *. Py is a full pattern. This shell function is not available in Windows. But the Glob module adds this functionality to Python, making it available to Windows programs.
Here is a query for all Python code files in the current directory using the glob module:
import glob
print(glob.glob('*.py'))
Copy the code
Glob.glob (‘*.py’) searches the current directory for files with the.py extension and returns them as a list. Glob also supports shell-style wildcards for matching:
import glob
for name in glob.glob('*[0-9]*.txt'):
print(name)
Copy the code
This will find all text files (.txt) with numbers in their filenames:
data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy the code
Glob also makes it easy to recursively search files in subdirectories:
import glob
for name in glob.iglob('**/*.py', recursive=True):
print(name)
Copy the code
This example uses glob.iglob() to search for all.py files in the current directory and subdirectories. Passing recursive=True as.iglob() causes it to search.py files in the current directory and subdirectories. Glob.glob () differs from glob.iglob() in that iglob() returns an iterator instead of a list.
Running the above code results in the following:
admin.py
tests.py
sub_dir/file1.py
sub_dir/file2.py
Copy the code
Pathlib also contains similar methods to flexibly retrieve file lists. The following example shows that you can use.path.glob () to list files of file types starting with the letter p.
from pathlib import Path
p = Path('. ')
for name in p.glob('*.p*'):
print(name)
Copy the code
Calling p.glob(‘*.p*’) returns a generator object pointing to all files in the current directory whose extensions begin with the letter P.
Path.glob() is similar to os.glob(), discussed above. As you can see, Pathlib blends many of the best features of the OS, os.path, and Glob modules into one module, which makes it easy to use.
To recap, here’s the list of features we introduced in this section:
function | describe |
---|---|
startswith() | Tests whether a string starts with a particular pattern, returning True or False |
endswith() | Tests whether a string ends in a particular pattern, returning True or False |
fnmatch.fnmatch(filename, pattern) | Tests whether the file name matches this pattern, returning True or False |
glob.glob() | Returns a list of filenames that match the pattern |
pathlib.Path.glob() | Returns a generator object that matches the pattern |
Traverse directories and process files
A common programming task is to traverse a directory tree and process the files in the directory tree. Let’s explore how to do this using the built-in Python function os.walk(). Os.walk () is used to generate file names in a directory tree by traversing the tree from top to bottom or from bottom to top. For the purposes of this section, we want to manipulate the following directory tree:
├ ─ ─ folder_1 │ ├ ─ ─ file1. Py │ ├ ─ ─ file2. Py │ └ ─ ─ file3. Py ├ ─ ─ folder_2 │ ├ ─ ─ file4. Py │ ├ ─ ─ file5. Py │ └ ─ ─ file6 does. Py ├ ─ ─ Test1. TXT └ ─ ─ test2. TXTCopy the code
Here is an example of using os.walk() to list all the files and directories in a directory tree.
Os.walk () defaults to traversing a directory from top to bottom:
import os
for dirpath, dirname, files in os.walk('. '):
print(f'Found directory: {dirpath}')
for file_name in files:
print(file_name)
Copy the code
Os.walk () returns three values in each loop:
- The name of the current folder
- Current folder List of subfolders
- A list of files in the current folder
In each iteration, it prints out the names of the subdirectories and files it finds:
Found directory: .
test1.txt
test2.txt
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Copy the code
To traverse the directory tree bottom-up, pass the topdown=False keyword argument to os.walk() :
for dirpath, dirnames, files in os.walk('. ', topdown=False):
print(f'Found directory: {dirpath}')
for file_name in files:
print(file_name)
Copy the code
Passing topDown =False causes os.walk() to first print out the files it finds in subdirectories:
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Found directory: .
test1.txt
test2.txt
Copy the code
As you can see, the program lists the contents of subdirectories before listing the contents of the root directory. This is useful in cases where you want to recursively delete files and directories. You will learn how to do this in the following sections. By default, OS.walk does not access directories created over a soft connection. You can override the default behavior by using the followLinks = True argument.
Create temporary files and directories
Python provides the tempFile module for easy creation of temporary files and directories.
Tempfiles can be opened while your program is running and temporary data can be stored in a file or directory. Tempfile will delete these temporary files after your program stops running.
Now, let’s see how to create a temporary file:
from tempfile import TemporaryFile
Create a temporary file and write some data to it
fp = TemporaryFile('w+t')
fp.write('Hello World! ')
Go back to the beginning and read the data from the file
fp.seek(0)
data = fp.read()
print(data)
# Close the file and it will be deleted
fp.close()
Copy the code
The first step is to import TemporaryFile from the TempFile module. Next, create an object-like file using the TemporaryFile() method and pass in a schema that you want to open the file. This creates and opens a file that can be used as a temporary storage area.
In the example above, the mode is W + T, which causes tempFile to create temporary text files in write mode. There is no need to provide a filename for the temporary file because it will be destroyed after the script is finished running.
After writing to the file, you can read from it and close it when you’re done processing. Once the file is closed, it is deleted from the file system. If you need to use tempfile as expected to generate temporary files, named. Please use the tempfile as expected NamedTemporaryFile ().
Temporary files and directories created with tempFile are stored in a special system directory for storing temporary files. Python will search the directory list for a directory in which the user can create a file.
On Windows, the directories are in the order C:\TEMP, C:\TMP, \TEMP and \TMP. On all other platforms, the directories are/TMP, /var/tmp, and /usr/tmp in that order. If none is present, tempFile stores temporary files and directories in the current directory.
.TemporaryFile() is also a context manager, so it can be used with the with statement. Using the context manager automatically closes and deletes files after reading them:
with TemporaryFile('w+t') as fp:
fp.write('Hello universe! ')
fp.seek(0)
fp.read()
The temporary file has now been closed and deleted
Copy the code
This creates a temporary file and reads data from it. Once the contents of the file are read, the temporary file is closed and deleted from the file system.
Tempfiles can also be used to create temporary directories. Let’s look at how to use the tempfile as expected. TemporaryDirectory () to do this:
import tempfile
import os
tmp = ' '
with tempfile.TemporaryDirectory() as tmpdir:
print('Created temporary directory ', tmpdir)
tmp = tmpdir
print(os.path.exists(tmpdir))
print(tmp)
print(os.path.exists(tmp))
Copy the code
Call tempfile as expected. TemporaryDirectory () will create a temporary directory in a file system, and returns a said the directory object. In the above example, the context manager is used to create a directory whose name is stored in the tmpdir variable. The third line prints out the name of the temporary directory, os.path.exists(tmpdir), to verify that the directory is actually created on the file system.
After the context manager exits the context, the temporary directory is deleted, and calls to OS.path.exists (tmpdir) return False, which means that the directory was successfully deleted.
Delete files and directories
You can delete individual files, directories, and entire directory trees using methods in the OS, Shutil, and Pathlib modules. Here’s how to delete files and directories you no longer need.
Delete files in Python
To remove individual files, use pathlib.path.unlink (), os.remove(), or os.unlink().
Os.remove () and os.unlink() are semantically identical. To remove a file using os.remove(), do the following:
import os
data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.remove(data_file)
Copy the code
Removing files using os.unlink() is similar to using os.remove() :
import os
data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.unlink(data_file)
Copy the code
Calling.unlink() or.remove() on a file removes the file from the file system. Both functions raise OSError if the path passed to them points to a directory rather than a file. To avoid this, check to see if the content you are deleting is a file and delete it when it is, or use exception handling to handle OSError:
import os
data_file = 'home/data.txt'
Delete if the type is file
if os.path.is_file(data_file):
os.remove(data_file)
else:
print(f'Error: {data_file} not a valid filename')
Copy the code
Os.path.is_file () checks whether data_file is actually a file. If so, it is removed by calling os.remove(). If data_file points to the folder, an error message is output to the console.
The following example shows how to use exception handling to handle errors when deleting files:
import os
data_file = 'home/data.txt'
# Use exception handling
try:
os.remove(data_file)
except OSError as e:
print(f'Error: {data_file} : {e.strerror}')
Copy the code
The code above attempts to delete the file before checking its type. If data_file is not actually a file, the raised OSError is handled in the except clause and an error message is printed to the console. The printed error message is formatted using Python F-strings.
Finally, you can also delete files using pathlib.path.unlink () :
from pathlib import Path
data_file = Path('home/data.txt')
try:
data_file.unlink()
except IsADirectoryError as e:
print(f'Error: {data_file} : {e.strerror}')
Copy the code
This creates a Path object named data_file that points to a file. Calling.unlink () on data_file will remove home/data.txt. If the data_file points to a directory, IsADirectoryError is raised. It is worth noting that the Python program above has the same permissions as the user running it. If the user does not have permission to delete the file, a PermissionError is raised.
Delete the directory
The library provides the following functions to delete directories:
- os.rmdir()
- pathlib.Path.rmdir()
- shutil.rmtree()
To remove a single directory or folder, use os.rmdir() or pathlib.path.rmdir (). These functions only work if you delete an empty directory. If the directory is not empty, OSError is raised. Here’s how to delete a folder:
import os
trash_dir = 'my_documents/bad_dir'
try:
os.rmdir(trash_dir)
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy the code
Trash_dir has now been removed via os.rmdir(). If the directory is not empty, an error message will be printed on the screen:
Traceback (most recent call last):
File '<stdin>', line 1, in <module>
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'
Copy the code
Similarly, you can use pathlib to delete directories:
from pathlib import Path
trash_dir = Path('my_documents/bad_dir')
try:
trash_dir.rmdir()
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy the code
A Path object is created to point to the directory to be deleted. If the directory is empty, call the.rmdir() method of the Path object to remove it.
Delete the entire directory tree
To remove non-empty directories and full directory trees, Python provides shutil.rmtree() :
import shutil
trash_dir = 'my_documents/bad_dir'
try:
shutil.rmtree(trash_dir)
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy the code
When shutil.rmtree() is called, everything in trash_DIR is deleted. In some cases, you may want to recursively delete empty folders. You can do this using one of the methods discussed above in conjunction with os.walk() :
import os
for dirpath, dirnames, files in os.walk('. ', topdown=False) :try:
os.rmdir(dirpath)
except OSError as ex:
pass
Copy the code
This will traverse the directory tree and try to delete each directory it finds. If the directory is not empty, raise OSError and skip the directory. The following table lists the features covered in this section:
function | describe |
---|---|
os.remove() | Delete a single file, but not a directory |
os.unlink() | As with os.remove(), the function removes individual files |
pathlib.Path.unlink() | Delete a single file, but not a directory |
os.rmdir() | Delete an empty directory |
pathlib.Path.rmdir() | Delete an empty directory |
shutil.rmtree() | Delete the entire directory tree, which can be used to delete non-empty directories |
Copy, move, and rename files and directories
Python ships with the shutil module. Shutil is short for shell utility. It provides many advanced operations for files to support copying, archiving, and deleting files and directories. In this section, you will learn how to move and copy files and directories.
Copy the file
Shutil provides several functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2(). To copy files from one location to another using shutil.copy(), do the following:
import shutil
src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy(src, dst)
Copy the code
Shutil.copy () is equivalent to the cp command on Unix-based systems. Shutil. copy(SRC, DST) copies the file SRC to the location specified in DST. If DST is a file, the contents of that file are replaced with the contents of SRC. If DST is a directory, SRC will be copied to that directory. Shutil.copy () copies only the contents and permissions of the file. Other metadata, such as file creation and modification times, is not retained.
To preserve all file metadata when copying, use shutil.copy2() :
import shutil
src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy2(src, dst)
Copy the code
Use.copy2() to retain details about the file, such as the last access time, permission bits, last modification time, and flags.
Copy directory
Although shutil.copy() copies only a single file, shutil.copytree() copies the entire directory and everything contained in it. Shutil.copytree (SRC, dest) takes two arguments: the source directory and the destination directory to which files and folders are copied.
Here is an example of how to copy the contents of a folder to another location:
import shutil
dst = shutil.copytree('data_1'.'data1_backup')
print(dst) # data1_backup
Copy the code
In this example,.copyTree () copies the contents of data_1 to a new location, data1_backup, and returns to the target directory. The destination directory cannot already exist. It will be created without its parent directory. Shutil.copytree () is a good way to back up files.
Move files and directories
To move a file or directory to another location, use shutil.move(SRC, DST).
SRC is the file or directory to move, DST is the destination:
import shutil
dst = shutil.move('dir_1/'.'backup/')
print(dst) # 'backup'
Copy the code
If backup/ exists, shutil.move(‘dir_1/’, ‘backup/’) moves dir_1/ to backup/. If backup/ does not exist, dir_1/ will be renamed backup.
Rename files and directories
Python contains os.rename(SRC, DST) for renaming files and directories:
import os
os.rename('first.zip'.'first_01.zip')
Copy the code
The above line first.zip will be renamed first_01.zip. OSError is raised if the target path points to a directory.
Another way to rename a file or directory is to use rename () in the pathlib module:
from pathlib import Path
data_file = Path('data_01.txt')
data_file.rename('data.txt')
Copy the code
To rename a file using pathlib, you first create a pathlib.path () object that contains the Path to the file you want to replace. The next step is to call rename() on the path object and pass in the new name of the file or directory you want to rename.
The archive
Archiving is a convenient way to package multiple files into a single file. The two most common archive types are ZIP and TAR. You write Python programs that create archive files, read archive files, and extract data from archive files. In this section you will learn how to read and write both compression formats.
Reading ZIP files
The Zipfile module is an underlying module that is part of the Python standard library. Zipfile has functions that make it easy to open and extract ZIP files. To read the contents of the ZIP file, the first thing you do is create a ZipFile object. A ZipFile object is similar to a file object created using open(). ZipFile is also a context manager, so it supports the with statement:
import zipfile
with zipfile.ZipFile('data.zip'.'r') as zipobj:
pass
Copy the code
Here we create a ZipFile object, pass in the name of the ZIP file and open it in read mode. Once the ZIP file is opened, information about the archive file can be accessed through the functions provided by the ZipFile module. The data.zip archive in the above example is created from a directory named Data, which contains a total of five files and one subdirectory:
. | ├ ─ ─ sub_dir / | ├ ─ ─ bar. The p y | └ ─ ─ foo py | ├ ─ ─ file1. Py ├ ─ ─ file2. Py └ ─ ─ file3. PyCopy the code
To get a list of files in an archive file, call namelist() on the ZipFile object:
import zipfile
with zipfile.ZipFile('data.zip'.'r') as zipobj:
zipobj.namelist()
Copy the code
This generates a list of files:
['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
Copy the code
.namelist() returns a list of names of files and directories in the archive file. To retrieve information about a file in an archive file, use.getInfo () :
import zipfile
with zipfile.ZipFile('data.zip'.'r') as zipobj:
bar_info = zipobj.getinfo('sub_dir/bar.py')
print(bar_info.file_size)
Copy the code
This will print:
15277
Copy the code
.getInfo () returns a ZipInfo object that stores information about a single member of the archive file. To get information about the file in the archive file, pass its path as a parameter to.getInfo (). Using getInfo (), you can retrieve information about archive file members, such as the date the file was last modified, compressed size, and its full filename. Accessing.file_size retrieves the original size of the file in bytes.
The following example shows how to retrieve more detailed information about archived files in the Python REPL. Assuming the zipfile module has been imported, bar_info is the same object created in the previous example:
>>> bar_info.date_time
(2018.10.7.23.30.10)
>>> bar_info.compress_size
2856
>>> bar_info.filename
'sub_dir/bar.py'
Copy the code
Bar_info contains details about bar.py, such as the size of the compression and its full path.
The first line shows how to retrieve the last modification date of the file. The next line shows how to get the size of the file after archiving. The last line shows the full path to bar.py in the archive file.
ZipFile supports the context manager protocol, which is why you can use it with the with statement. When this is done, the ZipFile object is automatically closed. Attempting to open or extract a file from a closed ZipFile object results in an error.
Extracting ZIP files
The zipFile module allows you to extract one or more files from ZIP files via.extract() and.Extractall ().
By default, these methods extract files to the current directory. They all take an optional path parameter that allows you to specify another specified directory to extract the file to. If the directory does not exist, it is automatically created. To extract a file from a compressed file, do the following:
>>> import zipfile
>>> import os
>>> os.listdir('. ')
['data.zip']
>>> data_zip = zipfile.ZipFile('data.zip'.'r')
>>> Extract a single file to current directory
>>> data_zip.extract('file1.py')
'/home/test/dir1/zip_extract/file1.py'
>>> os.listdir('. ')
['file1.py'.'data.zip']
>>> Carry all files to the specified directory
>>> data_zip.extractall(path='extract_dir/')
>>> os.listdir('. ')
['file1.py'.'extract_dir'.'data.zip']
>>> os.listdir('extract_dir')
['file1.py'.'file3.py'.'file2.py'.'sub_dir']
>>> data_zip.close()
Copy the code
The third line of code is a call to os.listdir(), which shows that the current directory has only one file data.zip.
Next, open data.zip in read mode and call.extract() to extract file1.py from it. .extract() returns the full file path of the extracted file. Since no path is specified,.extract() extracts file1.py to the current directory.
The next line prints a directory list showing that the current directory now contains archive files in addition to the original archive files. It then shows how to extract the entire archive into the specified directory. .Extractall () creates extract_dir and extracts the contents of data.zip into it. The last line closes the ZIP archive file.
Extract data from an encrypted document
Zipfile supports extraction of password-protected ZIP files. To extracta password-protected ZIP file, pass the password as an argument to the.extract() or.Extractall () method:
>>> import zipfile
>>> with zipfile.ZipFile('secret.zip'.'r') as pwd_zip:
. Extract data from an encrypted document
. pwd_zip.extractall(path='extract_dir', pwd='Quish3@o')
Copy the code
The secret.zip archive will be opened in read mode. The password is supplied to.extractall(), and the compressed file content is extracted to the extract_dir. Because of the with statement, the archive file is automatically closed after the extraction is complete.
Create a new archive file
To create a new ZIP archive, open the ZipFile object in write mode (W) and add the file to be archived:
>>> import zipfile
>>> file_list = ['file1.py'.'sub_dir/'.'sub_dir/bar.py'.'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip'.'w') as new_zip:
. for name in file_list:
. new_zip.write(name)
Copy the code
In this example, new_zip opens in write mode, and each file in file_list is added to the archive file. After the with statement ends, new_zip is closed. Opening the ZIP file in write mode deletes the contents of the compressed file and creates a new archive file.
To add a file to an existing archive file, open the ZipFile object in append mode and add the file:
>>> with zipfile.ZipFile('new.zip'.'a') as new_zip:
. new_zip.write('data.txt')
. new_zip.write('latin.txt')
Copy the code
This opens the new.zip archive that was created in append mode in the previous example. Opening the ZipFile object in append mode allows you to add a new file to the ZIP file without deleting its current contents. After you add a file to a ZIP file, the with statement takes the file out of context and closes it.
Open the TAR archive file
A TAR file is an uncompressed archive of files such as ZIP. They can be compressed using gzip, bzip2, and LZMA compression methods. The TarFile class allows the TAR archive to be read and written.
The following is read from the archive:
import tarfile
with tarfile.open('example.tar'.'r') as tar_file:
print(tar_file.getnames())
Copy the code
The tarFile object opens like most file-like objects. They have an open() function that takes a mode to determine how files should be opened.
Use “R”, “W”, or “A” modes to open the uncompressed TAR file for reading, writing, and appending, respectively. To open the compressed TAR file, pass the mode argument to tarfile.open() in the format filemode [:compression]. The following table lists the possible modes in which TAR files can be opened:
model | behavior |
---|---|
r | Open the archive in uncompressed read mode |
r:gz | Open the archive in gzip compressed read mode |
r:bz2 | Open the archive in bzip2 compressed read mode |
w | Opens the archive in uncompressed write mode |
w:gz | Open the archive in gzip compressed write mode |
w:xz | Open the archive in lZMA compressed write mode |
a | Open the archive in uncompressed append mode |
.open() defaults to ‘r’ mode. To read an uncompressed TAR file and retrieve the filename in it, use.getnames() :
>>> import tarfile
>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst'.'README.md'.'app.py']
Copy the code
This returns the names of the contents in the archive as a list.
Note: To show you how to use the different tarfile object methods, the TAR file in the example is opened and closed manually in an interactive REPL session.
By interacting with the TAR file in this way, you can see the output of running each command. In general, you might want to use a context manager to open file-like objects.
In addition, metadata for each entry in the archive can be accessed using special attributes:
>>> for entry in tar.getmembers():
. print(entry.name)
. print(' Modified:', time.ctime(entry.mtime))
. print(' Size :', entry.size, 'bytes')
. print()
CONTRIBUTING.rst
Modified: Sat Nov 1 09:09:51 2018
Size : 402 bytes
README.md
Modified: Sat Nov 3 07:29:40 2018
Size : 5426 bytes
app.py
Modified: Sat Nov 3 07:29:13 2018
Size : 6218 bytes
Copy the code
In this example, loop over the list of files returned by.getmembers() and print out the attributes for each file. .getMembers () returns objects with programmatically accessible properties, such as the name of each file in the archive, its size, and when it was last modified. After an archive is read or written, it must be closed to free system resources.
Extract files from the TAR archive
In this section, you will learn how to extract files from TAR archives using the following methods:
.extract()
.extractfile()
.extractall()
To extract a single file from the TAR archive, use extract(), passing in the filename:
>>> tar.extract('README.md')
>>> os.listdir('. ')
['README.md'.'example.tar']
Copy the code
The readme. md file is extracted from the archive to the file system. Call os.listdir() to verify that the readme.md file has been successfully extracted into the current directory. To extract or extract everything from the archive, use.Extractall () :
>>> tar.extractall(path="extracted/")
Copy the code
.Extractall () has an optional path parameter to specify where to unzip the file. Here, the archive is extracted into the extracted directory. The following command shows that the archive was successfully extracted:
$ ls
example.tar extracted README.md
$ tree. ├ ─ ─ example. Tar ├ ─ ─ extracted | ├ ─ ─ app. Py | ├ ─ ─ CONTRIBUTING. RST | └ ─ ─ the README. Md └ ─ ─ the README. 1 md directory, 5 files
$ ls extracted/
app.py CONTRIBUTING.rst README.md
Copy the code
To extract a file object for reading or writing, use.ExtractFile (), which takes a filename or TarInfo object as an argument. .extractFile () returns a file-like object that can be read and used:
>>> f = tar.extractfile('app.py')
>>> f.read()
>>> tar.close()
Copy the code
Open archives should always be closed after reading or writing. To close the archive, call.close() on the archive file handle or use the with statement when creating the tarFile object to automatically close the archive when it’s done. This frees up system resources and writes any changes you make to the archive to the file system.
Create a new TAR archive
To create a new TAR archive, you can do this:
>>> import tarfile
>>> file_list = ['app.py'.'config.py'.'CONTRIBUTORS.md'.'tests.py']
>>> with tarfile.open('packages.tar', mode='w') as tar:
. for file in file_list:
. tar.add(file)
>>> # Read the contents of the newly created archive
>>> with tarfile.open('package.tar', mode='r') as t:
. for member in t.getmembers():
. print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
Copy the code
First, you create a list of files to add to the archive so you don’t have to manually add each file.
The next line opens a new archive called Packes.tar in write mode using the With raytext manager. Opening the archive in write mode (‘ W ‘) allows you to write new files to the archive. All existing files in the archive are deleted and a new archive is created.
Once the archive is created and populated, the With context manager automatically closes it and saves it to the file system. The last three lines open the archive you just created and print out the names of the files it contains.
To add a new file to an existing archive, open the archive in Append mode (‘a’) :
>>> with tarfile.open('package.tar', mode='a') as tar:
. tar.add('foo.bar')
>>> with tarfile.open('package.tar', mode='r') as tar:
. for member in tar.getmembers():
. print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar
Copy the code
Opening an archive in Append mode allows you to add new files to it without deleting existing ones.
Use compressed archive
Tarfile can read and write to TAR archive files compressed using Gzip, bzip2, and LZMA. To read or write to the compressed archive, use tarfile.open(), passing the appropriate schema for the compressed type.
For example, to read or write data from a Gzzip-compressed TAR archive, use ‘R :gz’ or’ W :gz’ modes respectively:
>>> files = ['app.py'.'config.py'.'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
. tar.add('app.py')
. tar.add('config.py')
. tar.add('tests.py')
>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
. for member in t.getmembers():
. print(member.name)
app.py
config.py
tests.py
Copy the code
‘w:gz’ opens the gzip-compressed archive in write mode and ‘r:gz’ opens the gzip-compressed archive in read mode. Unable to open a compressed archive in append mode. To add a file to a compressed archive, you must create a new archive.
An easier way to create an archive
The Python standard library also supports the creation of TAR and ZIP archives using advanced methods in the Shutil module. The archive utility in Shutil allows you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower-level tarfile and Zipfile modules.
useshutil.make_archive()To create an archive
Shutil.make_archive () takes at least two arguments: the name of the archive and the archive format.
By default, it compresses all files in the current directory into the archive format specified in the format parameter. You can pass in the optional root_dir argument to compress files in different directories. .make_archive() supports zip, tar, bztar, and gztar archive formats.
Here’s how to create a TAR archive using Shutil:
import shutil
# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup'.'tar'.'data/')
Copy the code
This copies everything in data/and creates an archive named backup.tar on the file system and returns its name. To extract the archive, call.unpack_archive() :
shutil.unpack_archive('backup.tar'.'extract_dir/')
Copy the code
Call.unpack_archive() and pass in the archive name and destination directory to extract the contents of backup.tar into extract_dir/. ZIP archives can be created and extracted in the same way.
Read multiple files
Python supports reading data from multiple input streams or lists of files through the FileInput module. This module allows you to loop through the contents of one or more text files quickly and easily. Here’s a typical way to use FileInput:
import fileinput
for line in fileinput.input()
process(line)
Copy the code
Fileinput takes its input by default from the command line argument passed to sys.argv.
usefileinputLoop through multiple files
Let’s use FileInput to build a primitive version of cat, a common UNIX tool. The CAT tool reads the files sequentially, writing them to standard output. When multiple files are given in command line arguments, CAT concatenates the text files and displays the results in the terminal:
# File: fileinput-example.py
import fileinput
import sys
files = fileinput.input()
for line in files:
if fileinput.isfirstline():
print(f'\n--- Reading {fileinput.filename()}- ')
print('- >' + line, end=' ')
print()
Copy the code
There are two text files in the current directory. Running this command produces the following output:
$ python3 fileinput-example.py bacon.txt cupcake.txt
--- Reading bacon.txt ---
-> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip,
-> irure cillum drumstick elit.
-> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta.
-> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine.
-> Tri-tip doner kevin cillum ham veniam cow hamburger.
-> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in.
-> Ball tip dolor do magna laboris nisi pancetta nostrud doner.
--- Reading cupcake.txt ---
-> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake.
-> Topping muffin cotton candy.
-> Gummies macaroon jujubes jelly beans marzipan.
Copy the code
Fileinput allows you to retrieve more information about each line, such as whether it is the firstline(.isfirstline()), line number (.lineno()), and filename(.filename()). You can read more about it here.
conclusion
You now know how to use Python to perform the most common operations on files and file groups. You have learned to use different built-in modules to read, find, and manipulate files.
You can now do this in Python:
- Gets directory content and file properties
- Create directories and directory trees
- Use match pattern to match file names
- Create temporary files and directories
- Move, rename, copy, and delete files or directories
- Read and extract data from different types of archive files
- usefileinputRead multiple files at the same time
Pay close attention to the public number < code and art >, learn more foreign high-quality technical articles.