Column address: a Python module per week

Meanwhile, welcome to follow my wechat official account AlwaysBeta, more exciting content waiting for you.

Use object-oriented apis instead of low-level string operations to parse, build, test, and otherwise handle file names and paths.

Build path

To create a new path that references an existing path value, you extend the path with the/operator, which can take a string or other path object.

import pathlib

usr = pathlib.PurePosixPath('/usr')
print(usr)	# /usr

usr_local = usr / 'local'
print(usr_local)	# /usr/local

usr_share = usr / pathlib.PurePosixPath('share')
print(usr_share)	# /usr/share

root = usr / '.. '
print(root)	# /usr/..

etc = root / '/etc/'
print(etc)	# /etc
Copy the code

As shown in the root example, the operator combines them together when given a path value and does not normalize the result “..” when containing a parent directory reference. . However, if a paragraph begins with a path separator, it is interpreted in the same way as a new “root” reference to os.path.join(), removing the extra path separator from the middle of the path value, as shown in the ETC example here.

The path class includes the resolve() method, which normalizes a path by looking at the file system for directories and symlinks and generating absolute paths that name references.

import pathlib

usr_local = pathlib.Path('/usr/local')
share = usr_local / '.. ' / 'share'
print(share.resolve())	# /usr/share
Copy the code

Here the relative path is converted to the absolute path /usr/share. If the input path contains symbolic links, those symbolic links are also extended to allow the parsed path to refer directly to the target.

To build paths when segments are not known in advance, use JoinPath (), passing each path segment as a separate argument.

import pathlib

root = pathlib.PurePosixPath('/')
subdirs = ['usr'.'local']
usr_local = root.joinpath(*subdirs)
print(usr_local)	# /usr/local
Copy the code

As with the/operator, calling JoinPath () creates a new instance.

Given an existing path object, it is easy to build a new object with minor differences, such as referring to different files in the same directory, using with_name() to create a new path that replaces the file name. Create a new path to replace the file extension with with_suffix().

import pathlib

ind = pathlib.PurePosixPath('source/pathlib/index.rst')
print(ind)	# source/pathlib/index.rst

py = ind.with_name('pathlib_from_existing.py')
print(py)	# source/pathlib/pathlib_from_existing.py

pyc = py.with_suffix('.pyc')
print(pyc)	# source/pathlib/pathlib_from_existing.pyc
Copy the code

Both methods return a new object, leaving the original file unchanged.

Parsing path

Path objects have methods and properties that extract partial values from names. For example, the parts attribute generates a series of path segments based on path separator resolution.

import pathlib

p = pathlib.PurePosixPath('/usr/local')
print(p.parts)	# ('/', 'usr', 'local')
Copy the code

A sequence is a tuple that reflects the invariance of a path instance.

There are two ways to navigate the file system hierarchy “up” from a given path object. The parent property references a new path instance of the directory containing the path, returning the value via os.path.dirname(). The parents property is an iterable that produces a parent directory reference, continually “going up” the path hierarchy until the root directory is reached.

import pathlib

p = pathlib.PurePosixPath('/usr/local/lib')

print('parent: {}'.format(p.parent))

print('\nhierarchy:')
for up in p.parents:
    print(up)
    
# output
# parent: /usr/local
# 
# hierarchy:
# /usr/local
# /usr
# /
Copy the code

The example iterates through the parents property and prints the member values.

The rest of the path can be accessed through the properties of the path object. The name attribute holds the last part of the path, after the final path separator (the same value generated by os.path.basename()). The suffix attribute holds the value after the extension delimiter, and the STEM attribute holds the name part before the suffix.

import pathlib

p = pathlib.PurePosixPath('./source/pathlib/pathlib_name.py')
print('path : {}'.format(p))	# path : source/pathlib/pathlib_name.py
print('name : {}'.format(p.name))	# name : pathlib_name.py
print('suffix: {}'.format(p.suffix))	# suffix: .py
print('stem : {}'.format(p.stem))	# stem : pathlib_name
Copy the code

While the suffix and STEM values are similar to those generated by os.path.splitext(), the values are based only on the name and not the full path.

Creating a concrete path

Path can create instances of concrete classes from strings that refer to files, directories, or symbolic link names (or potential names) on the file system. This class also provides several convenient ways to build instances with common change locations, such as the current working directory and the user’s home directory.

import pathlib

home = pathlib.Path.home()
print('home: ', home)	# home: /Users/dhellmann

cwd = pathlib.Path.cwd()
print('cwd : ', cwd)	# cwd : /Users/dhellmann/PyMOTW
Copy the code

Directory content

There are three ways to access the directory list to discover the names of files available on the file system. Iterdir () is a generator whose Path generates a new instance for each item in the include directory.

import pathlib

p = pathlib.Path('. ')

for f in p.iterdir():
    print(f)
    
# output
# example_link
# index.rst
# pathlib_chmod.py
# pathlib_convenience.py
# pathlib_from_existing.py
# pathlib_glob.py
# pathlib_iterdir.py
# pathlib_joinpath.py
# pathlib_mkdir.py
# pathlib_name.py
Copy the code

If Path does not reference a directory, iterdir() raises NotADirectoryError.

Glob () only looks for files that match the pattern.

import pathlib

p = pathlib.Path('.. ')

for f in p.glob('*.rst'):
    print(f)
    
# output
# ../about.rst
# ../algorithm_tools.rst
# ../book.rst
# ../compression.rst
# ../concurrency.rst
# ../cryptographic.rst
# ../data_structures.rst
# ../dates.rst
# ../dev_tools.rst
# ../email.rst
Copy the code

The Glob processor supports recursive scanning ** using pattern prefixes or by calling rglob() instead of glob().

import pathlib

p = pathlib.Path('.. ')

for f in p.rglob('pathlib_*.py'):
    print(f)
    
# output
# ../pathlib/pathlib_chmod.py
# ../pathlib/pathlib_convenience.py
# ../pathlib/pathlib_from_existing.py
# ../pathlib/pathlib_glob.py
# ../pathlib/pathlib_iterdir.py
# ../pathlib/pathlib_joinpath.py
# ../pathlib/pathlib_mkdir.py
# ../pathlib/pathlib_name.py
# ../pathlib/pathlib_operator.py
# ../pathlib/pathlib_ownership.py
# ../pathlib/pathlib_parents.py
Copy the code

Because this example starts in the parent directory, a recursive search is required to find the matching example file pathlib_*.py.

Read and write files

Each Path instance contains methods for processing the contents of the files it references. To read content, use read_bytes() or read_text(). To write to a file, use write_bytes() or write_text().

Use the open() method to open the file and keep the file handle instead of passing the name to the built-in open() function.

import pathlib

f = pathlib.Path('example.txt')

f.write_bytes('This is the content'.encode('utf-8'))

with f.open('r', encoding='utf-8') as handle:
    print('read from open(): {! r}'.format(handle.read()))

print('read_text(): {! r}'.format(f.read_text('utf-8')))

# output
# read from open(): 'This is the content'
# read_text(): 'This is the content'
Copy the code

Manipulate directories and symbolic links

import pathlib

p = pathlib.Path('example_dir')

print('Creating {}'.format(p))
p.mkdir()

# output
# Creating example_dir
# Traceback (most recent call last):
# File "pathlib_mkdir.py", line 16, in 
      
# p.mkdir()
# File "... / lib/python3.6 pathlib. Py ", line 1226, in the mkdir
# self._accessor.mkdir(self, mode)
# File "... / lib/python3.6 pathlib. Py ", line 387, wrapped in
# return strfunc(str(pathobj), *args)
# FileExistsError: [Errno 17] File exists: 'example_dir'
Copy the code

If the path already exists, mkdir() raises FileExistsError.

Create a symbolic link using symlink_to(), which is named according to the value of the path and references the name given as the symlink_to() argument.

import pathlib

p = pathlib.Path('example_link')

p.symlink_to('index.rst')

print(p)	# example_link
print(p.resolve().name)	# index.rst
Copy the code

This example creates a symbolic link, then uses resolve() to read the link to find what it points to, and prints the name.

The file type

This example creates several different types of files and tests them, as well as some other device-specific files available on your local operating system.

import itertools
import os
import pathlib

root = pathlib.Path('test_files')

# Clean up from previous runs.
if root.exists():
    for f in root.iterdir():
        f.unlink()
else:
    root.mkdir()

# Create test files
(root / 'file').write_text('This is a regular file', encoding='utf-8')
(root / 'symlink').symlink_to('file')
os.mkfifo(str(root / 'fifo'))

# Check the file types
to_scan = itertools.chain(
    root.iterdir(),
    [pathlib.Path('/dev/disk0'), pathlib.Path('/dev/console')],
)
hfmt = '{:18s}' + ('  {:>5}' * 6)
print(hfmt.format('Name'.'File'.'Dir'.'Link'.'FIFO'.'Block'.'Character'))
print()

fmt = '{:20s} ' + ('{!r:>5}  ' * 6)
for f in to_scan:
    print(fmt.format(
        str(f),
        f.is_file(),
        f.is_dir(),
        f.is_symlink(),
        f.is_fifo(),
        f.is_block_device(),
        f.is_char_device(),
    ))
    
# output
# Name File Dir Link FIFO Block Character
# 
# test_files/fifo False False False True False False
# test_files/file True False False False False False
# test_files/symlink True False True False False False
# /dev/disk0 False False False False True False
# /dev/console False False False False False True
Copy the code

Each method, is_DIR (), is_file(), is_symlink(), is_socket(), is_FIFo (), is_block_device(), and is_char_device(), takes no arguments.

File attributes

You can access detailed information about a file using the methods stat() or lstat(), which check the status of something that might be a symbolic link. These methods produce the same results as os.stat() and os.lstat().

# pathlib_stat.py 

import pathlib
import sys
import time

if len(sys.argv) == 1:
    filename = __file__
else:
    filename = sys.argv[1]

p = pathlib.Path(filename)
stat_info = p.stat()

print('{} :.format(filename))
print(' Size:', stat_info.st_size)
print(' Permissions:', oct(stat_info.st_mode))
print(' Owner:', stat_info.st_uid)
print(' Device:', stat_info.st_dev)
print(' Created :', time.ctime(stat_info.st_ctime))
print(' Last modified:', time.ctime(stat_info.st_mtime))
print(' Last accessed:', time.ctime(stat_info.st_atime))

# output
# $ python3 pathlib_stat.py
# 
# pathlib_stat.py:
# Size: 607
# Permissions: 0o100644
# Owner: 527
# Device: 16777220
# Created : Thu Dec 29 12:38:23 2016
# Last modified: Thu Dec 29 12:38:23 2016
# Last accessed: Sun Mar 18 16:21:41 2018
# 
# $ python3 pathlib_stat.py index.rst
# 
# index.rst:
# Size: 19569
# Permissions: 0o100644
# Owner: 527
# Device: 16777220
# Created : Sun Mar 18 16:11:31 2018
# Last modified: Sun Mar 18 16:11:31 2018
# Last accessed: Sun Mar 18 16:21:40 2018
Copy the code

The output will vary depending on how the sample code is installed, try passing a different filename pathlib_stat.py on the command line.

To make it easier to access information about the file’s owner, use owner() and group().

import pathlib

p = pathlib.Path(__file__)

print('{} is owned by {}/{}'.format(p, p.owner(), p.group()))

# output
# pathlib_ownership.py is owned by dhellmann/dhellmann
Copy the code

The touch() method is similar to the Unix touch command to create a file or update the modification time and permissions of an existing file.

# pathlib_touch.py 

import pathlib
import time

p = pathlib.Path('touched')
if p.exists():
    print('already exists')
else:
    print('creating new')

p.touch()
start = p.stat()

time.sleep(1)

p.touch()
end = p.stat()

print('Start:', time.ctime(start.st_mtime))
print('End :', time.ctime(end.st_mtime))

# output
# $ python3 pathlib_touch.py
# 
# creating new
# Start: Sun Mar 18 16:21:41 2018
# End : Sun Mar 18 16:21:42 2018
# 
# $ python3 pathlib_touch.py
# 
# already exists
# Start: Sun Mar 18 16:21:42 2018
# End : Sun Mar 18 16:21:43 2018
Copy the code

Running this example multiple times updates existing files on subsequent runs.

permissions

On Unix-like systems, you can use chmod() to change file permissions, passing patterns as integers. You can construct schema values using constants defined in the STAT module. This example switches the user’s execution permission bit.

import os
import pathlib
import stat

# Create a fresh test file.
f = pathlib.Path('pathlib_chmod_example.txt')
if f.exists():
    f.unlink()
f.write_text('contents')

# Determine what permissions are already set using stat.
existing_permissions = stat.S_IMODE(f.stat().st_mode)
print('Before: {:o}'.format(existing_permissions))	# Before: 644

# Decide which way to toggle them.
if not (existing_permissions & os.X_OK):
    print('Adding execute permission')	# Adding execute permission
    new_permissions = existing_permissions | stat.S_IXUSR
else:
    print('Removing execute permission')
    # use xor to remove the user execute permission
    new_permissions = existing_permissions ^ stat.S_IXUSR

# Make the change and show the new value.
f.chmod(new_permissions)
after_permissions = stat.S_IMODE(f.stat().st_mode)
print('After: {:o}'.format(after_permissions))	# After: 744
Copy the code

delete

There are two ways to remove content from a file system, depending on the type. To remove an empty directory, use rmdir().

import pathlib

p = pathlib.Path('example_dir')

print('Removing {}'.format(p))
p.rmdir()

# output
# Removing example_dir
# Traceback (most recent call last):
# File "pathlib_rmdir.py", line 16, in 
      
# p.rmdir()
# File "... / lib/python3.6 pathlib. Py ", line 1270, in rmdir
# self._accessor.rmdir(self)
# File "... / lib/python3.6 pathlib. Py ", line 387, wrapped in
# return strfunc(str(pathobj), *args)
# FileNotFoundError: [Errno 2] No such file or directory: 'example_dir'
Copy the code

FileNotFoundError is raised if the directory does not exist, and it is an error to try to remove a non-empty directory.

For files, symbolic links and most other path types use unlink().

import pathlib

p = pathlib.Path('touched')

p.touch()

print('exists before removing:', p.exists())	# exists before removing: True

p.unlink()

print('exists after removing:', p.exists())	# exists after removing: False
Copy the code

The user must have permission to delete files, symbolic links, sockets, or other file system objects.

Related documents:

Pymotw.com/3/pathlib/i…