Python Import System?

The import keyword is familiar to every Python developer, whether we reference official libraries or third-party libraries, and can be imported in the form of import XXX. It is true that it is the foundation of the Python architecture, but it is often the “most basic and most important”. Imagine being asked by an interviewer to talk about “Python Import System”. Can you really talk for an hour?

Follow the public account “Technical Disassemble officer”, reply “import” to obtain the HD PDF reading version

You can actually do it. But to get started, what does the “Python Import System” look like

1. Basic Concepts

1. What can be imported? Basic concepts in Python

Before introducing Import System, let’s look at what can be imported in Python.

This question seems easy to answer, because in Python, everything is an object, everything belongs to object, and everything can be imported. Among these different objects, the two most important concepts we often use are Module and Package. Although they may appear to be two concepts, both are instances of PyModuleObject structures of type PyModule_Type in Python and are represented as a

object in Python.

// Objects/moduleobject.c

PyTypeObject PyModule_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "module"./* tp_name */
    sizeof(PyModuleObject),                     /* tp_basicsize */
    // ...
};
// 
      
        in Python corresponds to the underlying PyModule_Type
      
// The imported module object corresponds to the underlying PyModuleObject
Copy the code

Let’s take a look at what happens when you import modules and packages into Python

import os
import pandas

print(os)  # <module 'os' from 'C:\\python38\\lib\\os.py'>
print(pandas)  # <module 'pandas' from 'C:\\python38\\lib\\site-packages\\pandas\\__init__.py'>

print(type(os))  # <class 'module'>
print(type(pandas))  # <class 'module'>
Copy the code

As you can see from the above results, both modules and packages are the same in Python. They are a PyModuleObject. In Python, the distinction is not so obvious, but in order to facilitate our development, we usually distinguish them as follows:

1.1 module

In Python, the common *. Py files, the compile-optimized *. Pyc, *. Pyo files, and the extended *. Pyd, *. Pyw files are the smallest units of The Python code vector, and the files that stand alone are called modules.

1.2 package

A combination of these modules is called a package, and in Python, the more familiar package type is the package containing __init__.py. In general, we’re used to creating packages by basically creating A new directory A, creating __init__.py, and then happily importing package A. However, this is just one type of package — “Regular Packages”. In fact, as mentioned in PEP 420 — Implicit Namespace Packages, Since Python3.3, a new package type called “Namespace Packages” has been introduced. The differences between this package type and the common package type mentioned earlier are as follows:

Key difference: No__init__.pyAre identified as Namespace Packages
Of course there won’t be__file__Attribute, because for ordinary Packages__file__Attribute specifies__init__.pyThe address of the
__path__Instead of a List, it becomes a read-only iterable property that updates automatically when the parent path (or sys.path of the highest-level package) is changed, automatically performing a new search of the package section on the next import attempt within the package
__loader__Properties can contain different types of objects, which can be loaded by different types of loaders
Subpackages in packages can come from different directories, ZIP files, etc., so Python is availablefind_specThe search results, again, relate to the principle of import

As an additional note about the benefits of “Namespace Packages”, not to import Packages without __init__.py, but to use Python’s import mechanism to maintain a virtual space, better organize subpackages in different directories, and optionally use subpackages. Make them centrally managed by a large namespace.

So let’s say we have this structure

└ ─ ─ project ├ ─ ─ foo - package │ └ ─ ─ spam │ └ ─ ─ blah. Py └ ─ ─ bar - package └ ─ ─ spam └ ─ ─ grok. PyCopy the code

In both directories, the namespace spam is common. There is no __init__.py file in either directory.

Let’s see what happens if we add both foo-package and bar-package to the Python module path and try to import it

>>> import sys
>>> sys.path.extend(['foo-package'.'bar-package'])
>>> import spam.blah # normal import
>>> import spam.grok # also normal import
>>>
Copy the code

Two different package directories are merged together, and you can optionally import spam.blah and spam.grok or any other infinitely extended package, which has formed a “Namespace Packages” that works.

Take a closer look at PEP 420 — Implicit Namespace Packages for more gameplay.

2. How to import? Absolute import/relative import

Now that we know “what can be imported”, what are the ways to import? The import mechanism of Python 2.X and Python 3.X differs significantly in two time nodes:

PEP 420 — Implicit Namespace Packages — “Namespace Packages” introduced after Python 3.3
PEP 328 — Imports: Multi-Line and Absolute/Relative–Absolute/Relative Package

We’ve already talked about the first point, but now we’re going to focus on the second point, which is about absolute imports versus relative imports.

Prior to Python 2.6, Python’s default import mechanism was “relative import”, which has since been changed to “absolute import”. How to understand these two different import mechanisms?

First of all, both absolute import and relative import need a reference, otherwise the concept of “absolute” and “relative” would be impossible to talk about. The absolute import reference is the root folder of the project, while the relative import reference is the current location. Here we use an example to explain these two mechanisms:

First we create a directory structure like this:

└ ─ ─ project ├ ─ ─ package1 │ ├ ─ ─ module1. Py │ └ ─ ─ module2. Py │ └ ─ ─ Function Fx └ ─ ─ package2 ├ ─ ─ just set py │ └ ─ ─ Class Cx ├ ─ ─ module3. Py ├ ─ ─ module4, py └ ─ ─ subpackage1 └ ─ ─ module5. Py └ ─ ─ Function FyCopy the code

2.1 Absolute Import

Absolute paths require that we provide a full and detailed import path for each package or module, starting with the topmost folder

For example, if we want to import related classes or functions

from package1 import mudule1
from package1.module2 import Fx
from package2 import Cx
from package2.subpackage1.module5 import Fy
Copy the code

Advantage:

Clear code hierarchy: the full path of each imported data can be clearly understood, which is convenient for us to find the specific introduction location in time.
Eliminate relative location dependencies: You can easily execute separate PY files without worrying about reference errors, relative location dependencies, and so on.

Disadvantage:

Hardcoded top-level package name: This way to refactor the code would become very complicated, you need to check all the files to repair the hard-coded path (of course, use the IDE can rapidly change), on the other hand if you want to move the code you need to use your code or others also is very troublesome, because involves the start writing from the root directory path, when as a child module of another project, PS: Of course, it can be solved by packaging.
Import package names are too long: When the project hierarchy is too large, importing packages from the root directory can become complicated, requiring not only the logic of the entire project, package structure, but also the frequent write paths can become annoying when there are many package references.

2.2 Relative Import

When we use relative imports, we need to give relative to the current position, where we want to import the resource.

There are two types of relative import: implicit relative import and explicit relative import. For example, if we want to reference the module4 module in package2/module3.py, we can write

# package2/module3.py

import module4 Implicit relative import
from . import module4 Explicit relative import
from package2 import module4 # absolute import
Copy the code

To import class Cx and function Fy into package2/module3.py, write

# package2/module3.py
import Cx Implicit relative import
from . import Cx Explicit relative import
from .subpackage1.module5 import Fy
Copy the code

In the code. Indicates the directory where the current file is located, if.. Represents the upper level of the directory, three., four. And so on. As you can see, implicit relative imports are nothing more than implicit conditions for the current directory than Explicit relative imports, but this can be confusing and is officially deprecated in PEP 328 because “Explicit is better than implicit”.

Advantage:

Concise reference: Compared with absolute import, reference can be introduced according to the relative position of the package, without knowing the overall project structure, let alone writing a long absolute path, for example, we can putfrom a.b.c.d.e.f.e import e1becomefrom . import e1

Disadvantage:

Usage hassles: Relative paths tend to have various problems due to the need to be more specific about relative positions than absolute paths, such as the following cases

Suppose we change the project structure to this

└ ─ ─ project ├ ─ ─ run. Py ├ ─ ─ package1 │ ├ ─ ─ module1. Py │ └ ─ ─ module2. Py │ └ ─ ─ Function Fx └ ─ ─ package2 ├ ─ ─ just set py │ └ ─ ─ Class Cx ├ ─ ─ module3, py ├ ─ ─ module4, py └ ─ ─ subpackage1 └ ─ ─ module5. Py └ ─ ─ Function FyCopy the code

2.2.1 Top-level package differentiation Problem

For the execution entry run.py, we refer to it this way

from package2 import module3
from package2.subpackage1 import module5
Copy the code

Change this for module3.py and module5.py, respectively

# package2/module3

from ..package1 import module2

# package2/subpackage1/module5

from.import module3
def Fy() :.Copy the code

At this point, executing Python run.py causes this error

Traceback (most recent call last):
  File "run.py", line 1.in <module>
    from package2 import module3
  File "G:\company_project\config\package2\module3.py", line 1.in <module>
    from ..package1 import module2
Try to import relative to the top level package
ValueError: attempted relative import beyond top-level package
Copy the code

The reason is that when we treat run.py as an execution module, package1 and Package2 of the same class as the module are considered top-level packages, and we refer to them across top-level packages, which causes this error.

2.2.2 Abnormal Parent Package

For the reference to the execution entry run.py, we change it to this

from .package1 import module2
Copy the code

At this point, executing Python run.py causes this error

Traceback (most recent call last):
  File "run.py", line 1.in <module>
    from .package1 import module2
No parent package was found
ImportError: attempted relative import with no known parent package
Copy the code

Why is that? As explained by PEP 328

Relative imports use a module’s name attribute to determine that module’s position in the package hierarchy. If the Module’s name does not contain any package information (e.g. it is set to main) then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.

Relative imports use the module’s __name__ attribute to determine the module’s position in the package hierarchy. If the module’s name does not contain any package information (for example, it is set to __main__), a relative reference will assume that the module is a top-level module, regardless of its actual location on the file system.

In other words, relative imports find relative positions based on the values of the __name__ and __package__ variables. Most of the time, these variables do not contain any package information. For example, when __name__=__main__ and __package__=None, the Python interpreter does not know which package the module belongs to. In this case, relative references treat the module as a top-level module, regardless of its actual location on the file system.

Let’s look at run.py’s __name__ and __package__ values

print(__name__,__package__)
# from .package1 import module2
Copy the code

As a result,

__main__ None
Copy the code

As we can see, the Python interpreter considers it to be a top-level module without any information about the parent package to which the module belongs (__name__=__main__,__package__=None), so it throws an exception that the parent package cannot be found.

3. Normalize import

Code normalization is important for team development, so after knowing how to import packages and modules, an understanding of the various specifications for import is essential.

3.1 the import writing

Refer to the official code specification file PEP 8 for instructions on how to write import

Imports should usually be on separate lines

Imports are always put at the top of the file, just after any module comments and docstrings, And before Module globals and constants (Located at the top of the file, just after any module comments and docstrings, before module global variables and constants)

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import System is incorrectly configured (such as when a directory inside a package ends up on sys.path)

Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, Using both Readers and many Automated Tools (Try to import wildcards)

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Imports should be grouped in the following order:

Standard library imports.

Related third party imports.

Local application/library specific imports.

You should put a blank line between each group of imports.

And PEP 328

Rationale for Parentheses

Instead, it should be possible to use Python’s standard grouping mechanism (parentheses) to write the import statement:

According to the above specification, let’s take an example

# multi-line split
# advice
import os
import sys
The same package is allowed to split
from subprocess import Popen, PIPE 
# do not recommend
import os,sys

# top of file
# comments...
import os
a = 1

# import what you need, don't pollute the local space
# advice
from sys import copyright
# do not recommend
from sys import *

Package import order
import sys # system library

import flask # 3 party libraries

import my # Custom library

Make use of Python's standard bracket grouping
# advice
from sys import (copyright, path, modules)
# do not recommend
from sys import copyright, path, \ 
					modules
Copy the code

While these are not mandatory specifications, code normalization always feels “particularly good” to engineers.

3.2 Project Structure

The advantages and disadvantages of the two different import methods have been analyzed above, so how to better unify the import structure of our project in the actual project? Of course, although the official recommendation of import is absolute import, but we also need to look at it with a critical eye. I recommend looking at some good open source libraries like Torando and FastAPI

# fastapi\applications.py

from fastapi import routing
from fastapi.concurrency import AsyncExitStack
from fastapi.encoders import DictIntStrAny, SetIntStr
from fastapi.exception_handlers import (
    http_exception_handler,
    request_validation_exception_handler,
)
from fastapi.exceptions import RequestValidationError
from fastapi.logger import logger
from fastapi.openapi.docs import (
    get_redoc_html,
    get_swagger_ui_html,
    get_swagger_ui_oauth2_redirect_html,
)

# tornado\gen.py

from tornado.concurrent import (
    Future,
    is_future,
    chain_future,
    future_set_exc_info,
    future_add_done_callback,
    future_set_result_unless_cancelled,
)
from tornado.ioloop import IOLoop
from tornado.log import app_log
from tornado.util import TimeoutError
Copy the code

As you can see, at present a lot of open source libraries are to their own way to import the absolute import direction, but also because of their individual library hierarchy is not deep, so are not under the influence of the complex hierarchical nesting, also some people will ask, then for particularly large large projects, how to solve the problem of multi-layered nested? This problem needs to be analyzed based on specific projects. Whether to split large projects into small sub-projects or to combine “absolute import + relative import” with each other needs to be considered based on actual scenarios. For small projects, similar to the above two open source projects, “Absolute import + package” method is conducive to the source code reading is convenient for project use, transplantation, is the most recommended scheme.

The core import mechanism

The first part is all about basic concepts, most of which are more or less covered in daily development, but it’s something like “Import how to find packages?” “, “How do import loads into packages? “, “What is the underlying processing flow of import? For many developers, these problems are rarely touched by. Without understanding the core processing logic of Import System, it is difficult for us to use it better, that is, to modify and develop it again. On the other hand, it has certain influence on our subsequent design of large-scale System architecture. In this section, we will take a look at the core Import mechanism in the Import System.

In the following sections, we will focus on the source code to talk about the core Import mechanism in Import System

1. What does the import keyword do? Clarify the logic of import keywords

The import keyword is most familiar to The average developer, so let’s start with this keyword. The first step to understanding import is to start with The official documentation, The Import System in Chapter 5 of this Python Reference manual, which is highlighted at The beginning of The documentation

Python code in one module gains access to the code in another module by the process of importing it. The import statement is the most common way of invoking the import machinery, but it is not the only way. Functions such as importlib.import_module() and built-in import() can also be used to invoke the import machinery.

The import statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope. The search operation of the import statement is defined as a call to the import() function, with the appropriate arguments. The return value of import() is used to perform the name binding operation of the import statement. See the import statement for the exact details of that name binding operation.

A direct call to import() performs only the module search and, if found, the module creation operation. While certain side-effects may occur, such as the importing of parent packages, and the updating of various caches (including sys.modules), only the import statement performs a name binding operation.

When an import statement is executed, the standard builtin import() function is called. Other mechanisms for invoking the import system (such as importlib.import_module()) may choose to bypass import() and use their own solutions to implement import semantics.

We can get the following information from this introduction:

First of all, there are three ways to import a module, and all of them have the same effect

# import keyword
import os
print(os) # <module 'os' from 'C:\\python38\\lib\\os.py'>

Import_module standard library mode
from importlib import import_module
os = import_module("os")
print(os) # <module 'os' from 'C:\\python38\\lib\\os.py'>

# __import__ Built-in function mode
os = __import__("os")
print(os) # <module 'os' from 'C:\\python38\\lib\\os.py'>
Copy the code

Two operations are triggered when the import keyword is called: Search and load (also known as searching where the module is and loading the found module somewhere), where the search operation calls the __import__ function to get the specific value, which is then bound to the current scope by the load operation, that is, the import keyword calls the __import__ built-in function. Note also that __import__ only works in the module search phase, but has additional side effects. On the other hand, __import__ is the underlying import mechanism, but the import_module may use its own set of import mechanisms.

From the introduction we can probably understand some information about the keyword import, but specific or to look at the source code, first let’s look at the keyword import source logic, in a simplest way to look at the bottom of the import call, first create a file, in order to not be affected by other factors, only a simple line of code

# test.py

import os
Copy the code

Since import is a Python keyword, there is no way to view its source code through the IDE. Instead, we can view its bytecodes directly. In Python, bytecodes are viewed through the dis module, which can be directly manipulated by -m dis or import dis. To keep the bytecode clean, let’s do this

python -m dis test.py
Copy the code

We get the following result

  1           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (os)
              6 STORE_NAME               0 (os)
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE
Copy the code

Note that the bytecode corresponding to the import OS code is really short. Let’s ignore the two LOAD_CONST pressing instructions at the beginning (empty files will also appear) and find that the two directives IMPORT_NAME and STORE_NAME are called as OS arguments. The IMPORT_NAME directive imports the OS module, and then calls STORE_NAME to store the imported module in the local space of the current scope. When we call OS methods, we can find the corresponding module according to OS characters. This is how the Python parser executes the import bytecode

Let’s focus on the IMPORT_NAME directive. Let’s take a look at its implementation. Python’s instruction set is all implemented in Ceval. c, so let’s track down the IMPORT_NAME directive.

PS: As for how to view C code, I believe that everyone has their own method, I use Understand to see with you.

Ceval. c has a large switch case for instruction selection, and the case for IMPORT_NAME is as follows:

// ceval.c

case TARGET(IMPORT_NAME): {
            // Get the module name
            PyObject *name = GETITEM(names, oparg);
            // The previous LOAD_CONST stack instruction is supposed to correspond to the LOAD_CONST stack instruction, where the values 0 and None are assigned to level and fromlist respectively
    		// These two values need to be explained
            PyObject *fromlist = POP();
            PyObject *level = TOP();
    		// Initialize the module object PyModuleObject *
            PyObject *res;
            // import the key method, calling import_name, which returns the value to res
            res = import_name(tstate, f, name, fromlist, level);
            Py_DECREF(level);
            Py_DECREF(fromlist);
            // Push the module value to the stack
            SET_TOP(res);
            if (res == NULL)
                goto error;
            DISPATCH();
        }
Copy the code

I’m going to focus on the function import_name, which takes tstate, f, name, fromlist, and level and keep that in mind

// ceval.c
/ / call
res = import_name(tstate, f, name, fromlist, level);

import_name(PyThreadState *tstate, PyFrameObject *f,
            PyObject *name, PyObject *fromlist, PyObject *level)
{
    _Py_IDENTIFIER(__import__);
    PyObject *import_func, *res;
    PyObject* stack[5];
    // Get the __import__ function and see that it does exactly what it says
    // The bottom layer of import calls the __import__ built-in function
    import_func = _PyDict_GetItemIdWithError(f->f_builtins, &PyId___import__);
    // NULL indicates a fetch failure, and an __import__ not found error is generated in the Python interpreter
    // Then some mechanism can be used to get errors that are not found in similar modules
    if (import_func == NULL) {
        if(! _PyErr_Occurred(tstate)) { _PyErr_SetString(tstate, PyExc_ImportError,"__import__ not found");
        }
        return NULL;
    }

    /* Fast path for not overloaded __import__. */
    // To determine whether __import__ is overloaded, tstate comes from the argument
    if (import_func == tstate->interp->import_func) {
        // import its own native path, used when __import__ has not been overloaded
        int ilevel = _PyLong_AsInt(level);
        if (ilevel == - 1 && _PyErr_Occurred(tstate)) {
            return NULL;
        }
        // Without overloading, PyImport_ImportModuleLevelObject is called
        res = PyImport_ImportModuleLevelObject(
                        name,
                        f->f_globals,
                        f->f_locals == NULL ? Py_None : f->f_locals,
                        fromlist,
                        ilevel);
        return res;
    }
	
    Py_INCREF(import_func);
	// __import__ path to construct stack
    stack[0] = name;
    stack[1] = f->f_globals;
    stack[2] = f->f_locals == NULL ? Py_None : f->f_locals;
    stack[3] = fromlist;
    stack[4] = level;
    / / call __import__
    res = _PyObject_FastCall(import_func, stack.5);
    Py_DECREF(import_func);
    return res;
}
Copy the code

As shown in the code, there are actually two paths for import_name, that is, there is no import keyword and the underlying __import__ function is not called completely

Import Indicates the native import path of the keyword
__import__Built-in function import path

Note that the __import__ function calls pass in the fromlist and level arguments that we didn’t explain earlier. We can analyze these two arguments by looking at the source of the __import__ function

def __import__(name, globals=None.locals=None, fromlist=(), level=0) : 
    """ __import__(name, globals=None, locals=None, fromlist=(), level=0) -> module Import a module. Because this function is meant for use by the Python interpreter and not for general  use, it is better to use importlib.import_module() to programmatically import a module. The globals argument is only used to determine the context; they are not modified. The locals argument is unused. The fromlist should be a list of names to emulate ``from name import ... '', or an empty list to emulate ``import name''. When importing a module from a package, note that __import__('A.B', ...). returns package A when fromlist is empty, but its submodule B when fromlist is not empty. The level argument is used to determine whether to perform absolute or relative imports: 0 is absolute, while a positive number is the number of parent directories to search relative to the current module. """
    pass
Copy the code

For the fromList argument, null imports the top-level module. For the fromList argument, null imports the lower-level module

m1 = __import__("os.path")
print(m1)  # <module 'os' from 'C:\\python38\\lib\\os.py'>

m2 = __import__("os.path", fromlist=[""])
print(m2)  # <module 'ntpath' from 'C:\\python38\\lib\\ntpath.py'>
Copy the code

The other parameter level means that if it is 0, it means that only absolute imports are performed, and if it is a positive integer, it means the number of parent directories to search. Normally this value is not passed either.

With these two parameters explained, we move on, first examining the native import path of the import keyword

1.1 Import Keyword Native import path

Take a look at the import native path first, focusing on PyImport_ImportModuleLevelObject

// Python/import.c
/ / call
res = PyImport_ImportModuleLevelObject(
                        name,
                        f->f_globals,
                        f->f_locals == NULL ? Py_None : f->f_locals,
                        fromlist,
                        ilevel)

PyObject *
PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals,
                                 PyObject *locals, PyObject *fromlist,
                                 int level)
{
    _Py_IDENTIFIER(_handle_fromlist);
    PyObject *abs_name = NULL;
    PyObject *final_mod = NULL;
    PyObject *mod = NULL;
    PyObject *package = NULL;
    PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
    int has_from;
	
    // Non-null check
    if (name == NULL) {
        PyErr_SetString(PyExc_ValueError, "Empty module name");
        goto error;
    }

    // Type check for PyUnicodeObject
    if(! PyUnicode_Check(name)) { PyErr_SetString(PyExc_TypeError,"module name must be a string");
        goto error;
    }
    
    // Level cannot be less than 0
    if (level < 0) {
        PyErr_SetString(PyExc_ValueError, "level must be >= 0");
        goto error;
    }
	
    // level is greater than 0
    if (level > 0) {
        // Find the absolute path
        abs_name = resolve_name(name, globals, level);
        if (abs_name == NULL)
            goto error;
    }
    else {  
    	/ / show level = = 0
        if (PyUnicode_GET_LENGTH(name) == 0) {
            PyErr_SetString(PyExc_ValueError, "Empty module name");
            goto error;
        }
        // Assign name directly to the abs_name as this is an absolute import
        abs_name = name;
        Py_INCREF(abs_name);
    }

    // Call PyImport_GetModule to get the Module object
    // The module object checks to see if it exists in sys.modules
    // If not, load it from hard disk. After loading, save in sys.modules
    // Next time you import it, get it directly from sys.modules, more on that later
    mod = PyImport_GetModule(abs_name);
    / /...
    if (mod == NULL && PyErr_Occurred()) {
        goto error;
    }
    / /...
    if(mod ! =NULL&& mod ! = Py_None) { _Py_IDENTIFIER(__spec__); _Py_IDENTIFIER(_lock_unlock_module); PyObject *spec;/* Optimization: only call _bootstrap._lock_unlock_module() if
           __spec__._initializing is true.
           NOTE: because of this, initializing must be set *before*
           stuffing the new module in sys.modules.
         */
        spec = _PyObject_GetAttrId(mod, &PyId___spec__);
        if (_PyModuleSpec_IsInitializing(spec)) {
            PyObject *value = _PyObject_CallMethodIdObjArgs(interp->importlib,
                                            &PyId__lock_unlock_module, abs_name,
                                            NULL);
            if (value == NULL) {
                Py_DECREF(spec);
                goto error;
            }
            Py_DECREF(value);
        }
        Py_XDECREF(spec);
    }
    else {
        Py_XDECREF(mod);
        // Key part of the code
        mod = import_find_and_load(abs_name);
        if (mod == NULL) {
            gotoerror; }}/ /...
    else {
        // Call the private _handle_fromlist function in the importlib package to get the module
        final_mod = _PyObject_CallMethodIdObjArgs(interp->importlib,
                                                  &PyId__handle_fromlist, mod,
                                                  fromlist, interp->import_func,
                                                  NULL);
    }

  error:
    Py_XDECREF(abs_name);
    Py_XDECREF(mod);
    Py_XDECREF(package);
    if (final_mod == NULL)
        remove_importlib_frames();
    return final_mod;    
}
Copy the code

Many parts of the PyImport_ImportModuleLevelObject code call the importlib package. It seems that importlib plays a significant role in the import system

// Python/import.c

static PyObject *
import_find_and_load(PyObject *abs_name)
{
    _Py_IDENTIFIER(_find_and_load);
    PyObject *mod = NULL;
    PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
    int import_time = interp->config.import_time;
    static int import_level;
    static _PyTime_t accumulated;

    _PyTime_t t1 = 0, accumulated_copy = accumulated;

    PyObject *sys_path = PySys_GetObject("path");
    PyObject *sys_meta_path = PySys_GetObject("meta_path");
    PyObject *sys_path_hooks = PySys_GetObject("path_hooks");
    if (PySys_Audit("import"."OOOOO",
                    abs_name, Py_None, sys_path ? sys_path : Py_None,
                    sys_meta_path ? sys_meta_path : Py_None,
                    sys_path_hooks ? sys_path_hooks : Py_None) < 0) {
        return NULL;
    }


    /* XOptions is initialized after first some imports. * So we can't have negative cache before completed initialization. * Anyway, importlib._find_and_load is much slower than * _PyDict_GetItemIdWithError(). */
    if (import_time) {
        static int header = 1;
        if (header) {
            fputs("import time: self [us] | cumulative | imported package\n".stderr);
            header = 0;
        }

        import_level++;
        t1 = _PyTime_GetPerfCounter();
        accumulated = 0;
    }

    if (PyDTrace_IMPORT_FIND_LOAD_START_ENABLED())
        PyDTrace_IMPORT_FIND_LOAD_START(PyUnicode_AsUTF8(abs_name));
	// Call importlib's private function _find_and_load
    mod = _PyObject_CallMethodIdObjArgs(interp->importlib,
                                        &PyId__find_and_load, abs_name,
                                        interp->import_func, NULL);

    / /...

    return mod;
}
Copy the code

As you can see, the import logic ends up back in the importlib package. Now that we’re back in Python, let’s summarize the C code first, but there’s another path we haven’t looked at yet.

1.2 `import`Built-in function import path

Remember the other path to import_name in the beginning? Since the native path now returns to the Python part, it is also possible for another __import__ to join the native path at a node and call the Python code part together. Let’s look at the source code for the __import__ function. Since __import__ is a built-in function, The code is in Python\ bltinModule. c

// Python\bltinmodule.c

static PyObject *
builtin___import__(PyObject *self, PyObject *args, PyObject *kwds)
{
    static char *kwlist[] = {"name"."globals"."locals"."fromlist"."level".0};
    PyObject *name, *globals = NULL, *locals = NULL, *fromlist = NULL;
    int level = 0;

    if(! PyArg_ParseTupleAndKeywords(args, kwds,"U|OOOi:__import__",
                    kwlist, &name, &globals, &locals, &fromlist, &level))
        return NULL;
    return PyImport_ImportModuleLevelObject(name, globals, locals,
                                            fromlist, level);
}
Copy the code

Obviously, PyImport_ImportModuleLevelObject is called together, so we can understand that when the __import__ function is overloaded, the native path goes directly to PyImport_ImportModuleLevelObject. Let’s summarize the C code call flow so far

CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython: CPython _find_and_load is the process of searching and loading, and these two processes are corresponding to the concepts of Finder and Loader respectively. Detailed concepts we will explain in the source code, first look at the source code

# importlib/_bootstrap.py

_NEEDS_LOADING = object(a)def _find_and_load(name, import_) :
    """Find and load the module."""
    # add multithreaded lock manager
    with _ModuleLockManager(name):
    	# Look around in sys.modules again, and call _find_and_load_unlocked when there is no module
        module = sys.modules.get(name, _NEEDS_LOADING)
        if module is _NEEDS_LOADING:
            # name is the absolute path name
            return _find_and_load_unlocked(name, import_)

    if module is None:
        message = ('import of {} halted; '
                   'None in sys.modules'.format(name))
        raise ModuleNotFoundError(message, name=name)
    _lock_unlock_module(name)
    return module

def _find_and_load_unlocked(name, import_) :
    path = None
    # Disassemble absolute paths
    parent = name.rpartition('. ') [0]
    if parent:
        if parent not in sys.modules:
            _call_with_frames_removed(import_, parent)
        # Crazy side-effects!
        # Look again
        if name in sys.modules:
            return sys.modules[name]
        parent_module = sys.modules[parent]
        try:
            Take the __path__ attribute of the parent module
            path = parent_module.__path__
        except AttributeError:
            msg = (_ERR_MSG + '; {! r} is not a package').format(name, parent)
            raise ModuleNotFoundError(msg, name=name) from None
    # here shows the search operation
    spec = _find_spec(name, path)
    if spec is None:
        raise ModuleNotFoundError(_ERR_MSG.format(name), name=name)
    else:
    	# this shows the loading operation
        module = _load_unlocked(spec)
    if parent:
        # Set the module as an attribute on its parent.
        parent_module = sys.modules[parent]
        setattr(parent_module, name.rpartition('. ') [2], module)
    return module
Copy the code

Modules. The path attribute is the __path__ attribute of the parent class, which is the path to the module. If not, it is None (it should default later). The main method for searching for a module is _find_spec, and the main method for loading a module is _load_unlocked. So far, we have completed the preparation phase, and now we are in the search and load phases.

2. How was the Module discovered? — Deep Finder Finder

For a clue to the finder, we’ll start with the _find_spec function

# importlib/_bootstrap.py

def _find_spec(name, path, target=None) :
    """Find a module's spec."""
    # get meta_path
    meta_path = sys.meta_path
    if meta_path is None:
        # PyImport_Cleanup() is running or has been called.
        raise ImportError("sys.meta_path is None, Python is likely "
                          "shutting down")

    if not meta_path:
        _warnings.warn('sys.meta_path is empty', ImportWarning)

    # We check sys.modules here for the reload case. While a passed-in
    # target will usually indicate a reload there is no guarantee, whereas
    # sys.modules provides one.
    If the module exists in sys.modules, it needs to be overloaded
    is_reload = name in sys.modules
    Walk through the finder in meta_path
    for finder in meta_path:
        with _ImportLockContext():
            try:
            	  Call find_spec
                find_spec = finder.find_spec
            except AttributeError:
                # If there is no find_spec attribute, _find_spec_legacy is called
                spec = _find_spec_legacy(finder, name, path)
                if spec is None:
                    continue
            else:
            	  Find the spec using find_spec
                spec = find_spec(name, path, target)
        if spec is not None:
            # The parent import may have already imported this module.
            if not is_reload and name in sys.modules:
                module = sys.modules[name]
                try:
                    __spec__ = module.__spec__
                except AttributeError:
                    # We use the found spec since that is the one that
                    # we would have used if the parent module hadn't
                    # beaten us to the punch.
                    return spec
                else:
                    if __spec__ is None:
                        return spec
                    else:
                        return __spec__
            else:
                return spec
    else:
        return None
Copy the code

There is a concept mentioned in the code –sys.meta_path, the meta-path of the system. Print it out and see what it is

>>> import sys
>>> sys.meta_path
[
	<class '_frozen_importlib.BuiltinImporter'>, 
	<class '_frozen_importlib.FrozenImporter'>, 
	<class '_frozen_importlib_external.PathFinder'>]Copy the code

It turns out that it’s a list of Finder Finder, and the _find_spec procedure calls its find_spec function for each Finder, so let’s just pick any class and see what their find_spec function is, Like the first class < class ‘_frozen_importlib. BuiltinImporter’ >

2.1 Standard Finder

# importlib/_bootstrap.py

class BuiltinImporter:

    """Meta path import for built-in modules. All methods are either class or static methods to avoid the need to instantiate the class. """
    @classmethod
    def find_spec(cls, fullname, path=None, target=None) :
        if path is not None:
            return None
        Check whether it is a built-in module
        if _imp.is_builtin(fullname):
        	The BuiltinImporter class is not only a finder but also a loader. Spec_from_loader
            return spec_from_loader(fullname, cls, origin='built-in')
        else:
            return None

def spec_from_loader(name, loader, *, origin=None, is_package=None) :
    """Return a module spec based on various loader methods."""
    ModuleSpec object is returned based on the method of the different loader
    if hasattr(loader, 'get_filename') :if _bootstrap_external is None:
            raise NotImplementedError
        spec_from_file_location = _bootstrap_external.spec_from_file_location

        if is_package is None:
            return spec_from_file_location(name, loader=loader)
        search = [] if is_package else None
        return spec_from_file_location(name, loader=loader,
                                       submodule_search_locations=search)

    if is_package is None:
        if hasattr(loader, 'is_package') :try:
                is_package = loader.is_package(name)
            except ImportError:
                is_package = None  # aka, undefined
        else:
            # the default
            is_package = False
	# Finally return the ModuleSpec object
    return ModuleSpec(name, loader, origin=origin, is_package=is_package)
Copy the code

< class ‘_frozen_importlib. BuiltinImporter > find_spec method returns the eventually ModuleSpec object, The other class

is the same, but they identify modules differently: _imp. Is_builtin (fullname) and _imp is_frozen (fullname)

One extra point: the implementation of IS_frozen is in the Python/import.c file

// Python/import.c

/*[clinic inputIs_frozen function _imp. Is_frozen name: Unicode/ReturnsTrue ifButa collection of facts cannot be called any more than a pile of facts. Themodule name buta collection to a frozen module. [clinic start generated code _imp_is_frozen_impl(PyObject *module, PyObject *name) /*[clinic end generated code: output=01f408f5ec0f2577input=7301dbca1897d66b]*/
{
    const struct _frozen *p;

    p = find_frozen(name);
    return PyBool_FromLong((long) (p == NULL ? 0 : p->size));
}

/* Frozen modules */

static const struct _frozen *
find_frozen(PyObject *name)
{
    const struct _frozen *p;

    if (name == NULL)
        returnNULL; // Loop to compare the built-in Frozen Modulefor (p = PyImport_FrozenModules; ; p++) {
        if (p->name == NULL)
            return NULL;
        if (_PyUnicode_EqualToASCIIString(name, p->name))
            break;
    }
    return p;，
}
Copy the code

So what is the frozen Module? The details can be found in the Python Wiki[Freeze] section. In short, it creates a portable version of a Python script that comes with its own built-in interpreter (basically like a binary executable) so you can run it on a machine that doesn’t have Python.

Another change you can see from the code is that there is an error catch on the find_spec method, followed by a call to the _find_spec_legacy method, followed by a call to find_module

Find_spec = finder.find_spec except AttributeError: # If there is no find_spec attribute, call _find_spec_legacy spec = _find_spec_legacy(Finder, name, path) if spec is None: continue else: Spec = find_spec(name, path, target) def _find_legacy (finder, name, path): # This would be a good place for a DeprecationWarning if # we ended up going that route. loader = finder.find_module(name, path) if loader is None: return None return spec_from_loader(name, loader)Copy the code

What is the relationship between find_spec and find_module?

In PEP 451 — A ModuleSpec Type for the Import System Python version 3.4 will replace find_module with find_spec. Of course, for backward compatibility, Hence the error capture we showed above.

Finders are still responsible for identifying, and typically creating, the loader that should be used to load a module. That loader will now be stored in the module spec returned by find_spec() rather than returned directly. As is currently the case without the PEP, if a loader would be costly to create, that loader can be designed to defer the cost until later. MetaPathFinder.find_spec(name, path=None, target=None) PathEntryFinder.find_spec(name, target=None) Finders must return ModuleSpec objects when find_spec() is called. This new method replaces find_module() and find_loader() (in the PathEntryFinder case). If a loader does not have find_spec(), find_module() and find_loader() are used instead, for backward-compatibility. Adding yet another similar method to loaders is a case of practicality. find_module() could be changed to return specs instead of loaders. This is tempting because the import APIs have suffered enough, Especially considering pathentryfinder.find_loader () was just added in Python 3.3. However, the extra complexity and a less-than- explicit method name aren't worth it.Copy the code

2.2 Extended Finder

In addition to the two standard finders, the third extension has the

class

# importlib/_bootstrap_external.py

class PathFinder:

    """Meta path finder for sys.path and package __path__ attributes."""
    # meta path finder for sys.path and the package's __path__ attribute
    @classmethod
    def _path_hooks(cls, path) :
        """Search sys.path_hooks for a finder for 'path'."""
        if sys.path_hooks is not None and not sys.path_hooks:
            _warnings.warn('sys.path_hooks is empty', ImportWarning)
        # Search for hook function call paths from sys.path_hooks list
        for hook in sys.path_hooks:
            try:
                return hook(path)
            except ImportError:
                continue
        else:
            return None

    @classmethod
    def _path_importer_cache(cls, path) :
        """Get the finder for the path entry from sys.path_importer_cache. If the path entry is not in the cache, find the appropriate finder and cache it. If no finder is available, store None. """
        if path == ' ':
            try:
                path = _os.getcwd()
            except FileNotFoundError:
                # Don't cache the failure as the cwd can easily change to
                # a valid directory later on.
                return None
        PathFinder find_spec method annotation, search from sys.path_hooks and sys.path_importer_cache
        # here comes the new concept of sys.path_hooks
        Path_importer_cache is a finder cache, focusing on the _path_hooks method
        try:
            finder = sys.path_importer_cache[path]
        except KeyError:
            finder = cls._path_hooks(path)
            sys.path_importer_cache[path] = finder
        return finder

    @classmethod
    def _get_spec(cls, fullname, path, target=None) :
        """Find the loader or namespace_path for this module/package name."""
        # If this ends up being a namespace package, namespace_path is
        # the list of paths that will become its __path__
        namespace_path = []
        Search for each path in the path list, either sys.path or the package's __path__
        for entry in path:
            if not isinstance(entry, (str.bytes)) :continue
            Get finder again
            finder = cls._path_importer_cache(entry)
            if finder is not None:
                # After finder is found, the same process as before
                if hasattr(finder, 'find_spec') :If you are looking for the find_spec method of the appliance, the find_spec method of the appliance is called just as in the default finder
                    spec = finder.find_spec(fullname, target)
                else:
                    spec = cls._legacy_get_spec(fullname, finder)
                if spec is None:
                    continue
                if spec.loader is not None:
                    return spec
                portions = spec.submodule_search_locations
                if portions is None:
                    raise ImportError('spec missing loader')
                # This is possibly part of a namespace package.
                # Remember these path entries (if any) for when we
                # create a namespace package, and continue iterating
                # on path.
                namespace_path.extend(portions)
        else:
            The spec object with the namespace path is returned. This is the ModuleSpec object that creates a spatial named package
            spec = _bootstrap.ModuleSpec(fullname, None)
            spec.submodule_search_locations = namespace_path
            return spec

    @classmethod
    def find_spec(cls, fullname, path=None, target=None) :
        """Try to find a spec for 'fullname' on sys.path or 'path'. The search is based on sys.path_hooks and sys.path_importer_cache. ""
        # sys.path is used by default if there is no path
        if path is None:
            path = sys.path
        Call the internal private function _get_spec to get the spec
        spec = cls._get_spec(fullname, path, target)
        if spec is None:
            return None
        elif spec.loader is None:
            If no loader is available, use the command space package lookup method
            namespace_path = spec.submodule_search_locations
            if namespace_path:
                # We found at least one namespace path. Return a spec which
                # can create the namespace package.
                spec.origin = None
                spec.submodule_search_locations = _NamespacePath(fullname, namespace_path, cls._get_spec)
                return spec
            else:
                return None
        else:
            return spec
Copy the code

Let’s start with a new concept in the code, sys.path_hooks

>>> sys.path_hooks
[
  <class 'zipimport.zipimporter'>, 
  <function FileFinder.path_hook. <locals>.path_hook_for_FileFinder at 0x000001A014AB4708>]Copy the code

Based on the code and the comments we’ve marked, you can probably deduce the logic like this (assuming we’re using the path sys.path and it’s the first load and no caching is involved).

For both hook functions,

returns the ModuleSpec object with the loader that should load the zip file,

. Path_hook_for_FileFinder

# importlib/_bootstrap_external.py

class FileFinder:

    """File-based finder. Interactions with the file system are cached for performance, being refreshed when the directory the finder is handling has been modified. """
    def _get_spec(self, loader_class, fullname, path, smsl, target) :
        loader = loader_class(fullname, path)
        return spec_from_file_location(fullname, path, loader=loader,
                                       submodule_search_locations=smsl)
    def find_spec(self, fullname, target=None) :
        """Try to find a spec for the specified module. Returns the matching spec, or None if not found. """
        # Check for a file w/ a proper suffix exists.
        for suffix, loader_class in self._loaders:
            full_path = _path_join(self.path, tail_module + suffix)
            _bootstrap._verbose_message('trying {}', full_path, verbosity=2)
            if cache_module + suffix in cache:
                if _path_isfile(full_path):
                    return self._get_spec(loader_class, fullname, full_path,
                                          None, target)
        if is_namespace:
            _bootstrap._verbose_message('possible namespace for {}', base_path)
            spec = _bootstrap.ModuleSpec(fullname, None)
            spec.submodule_search_locations = [base_path]
            return spec
        return None

def spec_from_file_location(name, location=None, *, loader=None,
                            submodule_search_locations=_POPULATE) :
    """Return a module spec based on a file location. To indicate that the module is a package, set submodule_search_locations to a list of directory paths. An empty list is sufficient, though its not otherwise useful to the import system. The loader must take a spec as its only __init__() arg. """
    spec = _bootstrap.ModuleSpec(name, loader, origin=location)
    spec._set_fileattr = True

    # Pick a loader if one wasn't provided.
    if loader is None:
        for loader_class, suffixes in _get_supported_file_loaders():
            if location.endswith(tuple(suffixes)):
                loader = loader_class(name, location)
                spec.loader = loader
                break
        else:
            return None

    # Set submodule_search_paths appropriately.
    if submodule_search_locations is _POPULATE:
        # Check the loader.
        if hasattr(loader, 'is_package') :try:
                is_package = loader.is_package(name)
            except ImportError:
                pass
            else:
                if is_package:
                    spec.submodule_search_locations = []
    else:
        spec.submodule_search_locations = submodule_search_locations
    if spec.submodule_search_locations == []:
        if location:
            dirname = _path_split(location)[0]
            spec.submodule_search_locations.append(dirname)

    return spec
Copy the code

_get_supported_file_loaders indicates the loader type to be returned

# importlib/_bootstrap_external.py

def _get_supported_file_loaders() :
    """Returns a list of file-based module loaders. Each item is a tuple (loader, suffixes). """
    extensions = ExtensionFileLoader, _imp.extension_suffixes()
    source = SourceFileLoader, SOURCE_SUFFIXES
    bytecode = SourcelessFileLoader, BYTECODE_SUFFIXES
    return [extensions, source, bytecode]
Copy the code

So ExtensionFileLoader, SourceFileLoader, SourcelessFileLoader, so far we’ve been able to get the general logic of Finder Finder

3. How is Module loaded? — Deep Loader

For a clue about the finder, we’ll start with the _load_unlocked function

# importlib/_bootstrap.py

def module_from_spec(spec) :
    """Create a module based on the provided spec."""
    Create a module object and assign all the attributes of the spec to the module, see the _init_module_attrs method
    # Typically loaders will not implement create_module().
    module = None
    if hasattr(spec.loader, 'create_module') :# If create_module() returns `None` then it means default
        # module creation should be used.
        module = spec.loader.create_module(spec)
    elif hasattr(spec.loader, 'exec_module') :raise ImportError('loaders that define exec_module() '
                          'must also define create_module()')
    if module is None:
        module = _new_module(spec.name)
    _init_module_attrs(spec, module)
    return module

def _load_backward_compatible(spec) :
    # (issue19713) Once BuiltinImporter and ExtensionFileLoader
    # have exec_module() implemented, we can add a deprecation
    # warning here.
    try:
        spec.loader.load_module(spec.name)
    except:
        if spec.name in sys.modules:
            module = sys.modules.pop(spec.name)
            sys.modules[spec.name] = module
        raise

def _load_unlocked(spec) :
    Unlocked at this point, we have the concrete ModuleSpec object, which _load_unlocked will load into the system for us
    # A helper for direct use by the import system.
    if spec.loader is not None:
        # Not a namespace package.
        ModuleSpec objects have exec_module methods
        if not hasattr(spec.loader, 'exec_module') :If not, the load_module method is called
            return _load_backward_compatible(spec)
    Create a Module object
    module = module_from_spec(spec)

    # This must be done before putting the module in sys.modules
    # (otherwise an optimization shortcut in import.c becomes
    # wrong).
    spec._initializing = True
    try:
        The module has not been loaded yet
        sys.modules[spec.name] = module
        try:
            if spec.loader is None:
                if spec.submodule_search_locations is None:
                    raise ImportError('missing loader', name=spec.name)
                # A namespace package so do nothing.
            else:
                # Call the exec_module specific to each loader to actually start loading the find_spec methods of the module similar to different finders
                spec.loader.exec_module(module)
        except:
            try:
                # failure
                del sys.modules[spec.name]
            except KeyError:
                pass
            raise
        # Move the module to the end of sys.modules.
        # We don't ensure that the import-related module attributes get
        # set in the sys.modules replacement case. Such modules are on
        # their own.
        module = sys.modules.pop(spec.name)
        sys.modules[spec.name] = module
        _verbose_message('import {! r} # {! r}', spec.name, spec.loader)
    finally:
        spec._initializing = False

    return module
Copy the code

As you can see, the module loading logic is clearer than the search logic. In the same way as Finder, the difference between load_module and exec_module is also the official recommended loading method after Python3.4

ExtensionFileLoader: Calls the builtin object_imp.create_dynamic()In the_PyImportLoadDynamicModuleWithSpec()We see the final program calldlopen/LoadLibraryTo load the dynamic link library and execute thePyInit_modulename.
SourcelessFileLoaderReading:*.pycFile, and then intercept 16 bytes after the content, callmarshal.loads()Convert the read content tocode objectThen call the builtin function exec in the Module object__dict__To execute thiscode object.
SourceFileLoaderThe logic is similar, but it first calls the compiler to convert the code tocode object.

Finally, when the module object is loaded, it is cached in sys.modules and can be loaded directly in sys.modules the next time it is called. This is what we’ve seen over and over again to determine whether a module object exists in sys.modules.

Third, the import tricks

After understanding the core process of import, I believe everyone can quickly understand the principle of its use and some tricks methods introduced on the Internet

1. Dynamically modify the module search path

For the change of the search phase of import, the search phase uses three finder to find the package path in the path list. If we want to change the search path dynamically, we can use these two methods:

Change the search path: to defaultsys.pathAdd directories to expand the search scope
changesys.meta_pathCustom Finder, which defines our own desired search behavior, using similar methods such as remotely loading modules

2. import hook

Meta_path or sys.path_hooks are used to change the default loading of imports.

3. Plug-in system

Using Python to implement a simple plug-in system, the core is the dynamic import and update of the plug-in, about the dynamic import way can use __import__ built-in function or importlib.import_module to achieve according to the name of the module, another aspect is the update of the plug-in, As we saw above, when the module object is loaded by the exec_module method, the code object is executed and stored in sys.modules. If we want to update a module, Modules’ module key can’t just be removed and loaded in (because we might keep a reference to this module somewhere else, and the two module objects are inconsistent). Instead, we need to use importlib.reload(), You can reuse the same module object and simply re-initialize the module contents by re-running the module’s code, the code Object we mentioned above.