Pathlib is more straightforward and Pythonic than the usual os.path. But it’s not just about simplifying things, it’s about something bigger

An overview of the

Pathlib is a Python built-in library defined by the Python documentation as object-oriented filesystem paths. Pathlib provides classes that represent file system paths, with semantics that are applicable to different operating systems. The path class is divided between pure paths, which provide pure computation without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.

Sound a little convoluted? That’s right, it’s a literal translation after all, but that doesn’t stop us from liking it. Let’s look at a couple of examples

Take a chestnut

The Python3 standard library pathlib module’s path is easier to manipulate paths than the OS module’s path method.

Gets the current file path

When using the OS module, there are two ways to directly obtain the current file path:

import os

value1 = os.path.dirname(__file__)
value2 = os.getcwd()
print(value1)
print(value2)
Copy the code

How does pathlib get the current file path?

The official documentation gives recommendations for eye-piercing teleportation

Give it a try

import pathlib

value1 = pathlib.Path.cwd()
print(value1)
Copy the code

How is it implemented

As documented, it returns the path in the form of os.getcwd(). Let’s take a look at the source code (CTRL + left mouse click on Pycharm to follow up on the specified object)

Return a new path pointing to the current working directory by pointing to a new path pointing to the current working directory

It doesn’t look like anything special, but why did the authorities go out of their way to introduce it?

Other packages

Pathlib encapsulates a number of OS paths, as documented, such as:

# Relationship description
 os.path.expanduser() --> pathlib.Path.home()
 
 os.path.expanduser() --> pathlib.Path.expanduser()
 
 os.stat() --> pathlib.Path.stat()
 
 os.chmod() --> pathlib.Path.chmod()
Copy the code

Official website document screenshot:

For details, please check the official document: Eye transfer

Just a few more chestnuts

This example doesn’t show anything, but it gives us an idea of what pathlib is made of, so let’s get a feel for how convenient it is.

Gets the upper/upper level directory

Which is to get the name of its grandfather

import os

print(os.path.dirname(os.path.dirname(os.getcwd())))
Copy the code

If implemented with pathlib:

import pathlib

print(pathlib.Path.cwd().parent.parent)

Copy the code

Parent is done. Is this closer to Pythonic? Write code like English.

If you only need to find the dad, use it once:

import pathlib

print(pathlib.Path.cwd().parent)

Copy the code

You can also go back to your grandparents:

import pathlib

print(pathlib.Path.cwd().parent.parent.parent)
Copy the code

Is using parent much more convenient than using the multilayer OS.path. dirname used by previous OS modules?

Path joining together

If you want to concatenate paths from its grandparents, you need to write this long string of code:

import os

print(os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "Attention"."Wechat Official Account"."[attacking]"."Coder 】"))

Copy the code

When you use Pathlib, you must feel happy:

import pathlib

parts = ["Attention"."Wechat Official Account"."[attacking]"."Coder 】"]
print(pathlib.Path.cwd().parent.parent.joinpath(*parts))

Copy the code

You can also adjust the number of parents by increasing or decreasing the number of parents.

PurePath

Much of this is done through Path in Pathlib, which also has a module called PurePath.

PurePath is a PurePath object that provides path-handling operations without actually accessing the file system. There are three ways to access these classes, which we also call Flavor.

The above quote comes from an official document, but it sounds a bit convoluted, so let’s look at it through chestnuts

PurePath.match

Let’s determine if the current file path has a file that matches the ‘*.py’ rule

import pathlib

print(pathlib.PurePath(__file__).match('*.py'))
Copy the code

Obviously, the coder. Py that we wrote the code to conform to the rules, so the output is True.

Why am I using this as an example? Further thinking that pathlib.PurePath can be followed by match means that it should be an object, not a path string. To test this idea, change the code:

import pathlib
import os


os_path = os.path.dirname(__file__)
pure_path = pathlib.PurePath(__file__)
print(os_path, type(os_path))
print(pure_path, type(pure_path))
print(pathlib.PurePath(__file__).match('*.py'))

Copy the code

PurePosixPath
coder.py

So this is a little bit of a mystery, what exactly is PurePosixPath?

PurePosixPath and PureWindowsPath, but don’t worry, These classes are not specified to run on some operating system. You can instantiate all of them regardless of which system you are running, because they do not provide any operations for making system calls.

Does not provide any operation to make a system call. What is that? I’m really getting deeper and deeper

The document begins with this description:

Pure paths are useful in some special cases; for example: If you want to manipulate Windows paths on a Unix machine (or vice versa). You cannot instantiate a WindowsPath when running on Unix, but you can instantiate PureWindowsPath. You want to make sure that your code only manipulates paths without actually accessing the OS. In this case, instantiating one of the pure classes may be useful since those simply don’t have any OS-accessing operations.

Pure paths are useful in some special cases; For example, if you want to operate Windows paths on a Unix computer (and vice versa). WindowsPath cannot be instantiated when running on Unix, but PureWindowsPath can be instantiated. You want to ensure that your code only manipulates paths and does not actually access the operating system. In this case, it might be useful to instantiate one of the pure classes, because those just don’t have any operating system access operations.

Along with a picture:

I don’t quite understand what it means. Never mind, keep reading.

Corresponding relations between

As you can see from the examples above, it not only encapsulates common methods related to OS.path, but also integrates other modules of the OS, such as creating folder path.mkdir.

If you’re worried about forgetting it, it’s okay. It’s always there. And the document lists the corresponding relationship table for us

Basic usage

Path.iterdir() # Iterate over subdirectories or files of a directory

Path.is_dir() # check if it is a directory

Path.glob() # Filter directory (return generator)

Path.resolve() # Return absolute Path

Path.exists() # Determine whether a Path exists

Path.open() # Open file (with support)

Path.unlink() # Delete file or directory

Basic attributes

Path.parts # Splits the Path like os.path.split(), but returns a tuple

Path.drive # Returns the drive name

Root # Returns the root directory of the Path

Path.anchor # Automatically returns drive or root

Path.parents # returns a list of all parent directories

Change the path

Path.with_name() # Change Path name, change last level pathname

Path.with_suffix() # Change the Path suffix

Stitching path

Path.joinpath() # Joinpath

Path.relative_to() # Computes relative paths

Testing path

Path.match() # Tests whether the Path matches pattern

Path.is_dir() # Is a file

Path.is_absolute() # Is an absolute Path

Path.is_reserved() # Whether the Path is reserved

Path.exists() # Determine whether a Path actually exists

Other methods

Path.cwd() # return the Path object of the current directory

Path.home() # Return the home Path object of the current user

Path.stat() # return Path information, same as os.stat()

Path.chmod() # change Path permissions similar to os.chmod()

Path.expanduser() # expand ~ return the full Path object

Path.mkdir() # create directory

Path.rename() # Rename a Path

Path.rglob() # Recursively traverses files in all subdirectories

Pathlib review

From the above examples, we should have a general idea of pathlib. Let’s review the official definition of the Pathlib library:

This module offers classes representing filesystem paths with semantics appropriate for different operating systems. Path classes are divided between pure paths, which provide purely computational operations without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.

Definition: Pathlib provides classes that represent file system paths, with semantics that are applicable to different operating systems. The path class is divided between pure paths, which provide pure computation without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.

Go back to the diagram and re-understand Pathlib

If you’ve never used this module before, or are just not sure which class is right for your task, Path is probably what you need. It instantiates a concrete path for the platform on which the code runs.

Summary: PathLib does not simply encapsulate modules or methods in the OS, but is designed to be compatible with different operating systems. It defines interfaces for each type of operating system. You want to manipulate Windows paths on a UNIX machine, but you can’t do that directly, so PurePath is a set of interfaces that you can use to do that (and vice versa)