Current State of Python Packaging – 2019

Stefano Borini

The Nuggets translation Project

Permanent link to this article: github.com/xitu/gold-m…

Translator: EmilyQiRabbit

Proofreader: TokenJan, IT-Rosalyn

The state of Python packaging (written in 2019)

In this article, I’ll try to walk you through the intricacies of Python packaging. Over the past two months, I’ve spent my prime time every evening gathering as much information as possible on relevant issues, current solutions, and what’s left.

Vague Python terminology is the first source of confusion. In programming-related contexts, the word “package” means a component that can be installed (a library, for example). This is not the case in Python, where the term for an installable component is “distribution.” However, no one really uses the term “distribution” at all unless it is necessary (especially in official documentation and Python enhancement proposals). By the way, this is a very bad term to use, because the word “distribution” is used to describe a brand of Linux.

This is a warning you should keep in mind, because the Python package is not really a package about Python, but a distribution of it. But I still call it packing.

I don’t want to spend so much time reading. Can you give me a short version? How should I manage Python packages in 2019?

I assume you are a programmer who wants to start developing a Python package. Here are the steps:

Start with Poetry to create the development environment and specify the project’s direct dependencies using strict patterns. This ensures that your development and test environments are always repeatable.
Create a PyProject.toml file, then use poetry as a back end to create the source version and binary distribution.
The next step is to specify the abstract package dependency. Note that you should specify the lowest version of the package that you are sure will run. This ensures that you don’t create useless versions that conflict with other packages.

If you really want to use the old method that requires SetupTools:

Create the setup.py file, specify all the abstract dependencies in the file, and specify in Install_requires that these dependencies use the lowest working version.
createrequirements.txtFile in which strict, specific (that is, to specify a version), direct dependencies are specified. Next you will need to use this file to generate the actual working environment.
Using the commandpython -m venvCreate a virtual environment, activate it, and then use it in itpip install -rrequirements.txtCommand to install dependencies. Use this environment for development.
If you need a dependency for testing (which is quite possible), you need to create onedev-requirements.txtFile, and also install dependencies for it.
If you need to freeze all environment configuration (which is recommended), do itpip freeze >requirements-freeze.txtYou will also use this command to create environments in the future.

I have plenty of time. Please help me explain.

I’m going to start by saying what the problems are, and there are a lot of them.

Suppose I wanted to create a “project” in Python: it might be a stand-alone program, or it might be a library. The development and use of this project needs to include the following “roles” :

Developer: The person or team responsible for writing code.
CI: Test the automation process for this project.
Build: From our Git repository to automatic or semi-automatic processes that others can install to use this project.
End user: The person or team that ultimately uses the project. If the project is a library, the end users may be other developers; Or if it’s an app, the end user might be the general public. Or maybe the project is a web service, and the end user is a cloud computing microservice. Of course there are a lot of possibilities, you know what I mean, not to mention them.

The goal is for all users or devices to be happy with the project, but they all have different workflows and requirements, and sometimes those requirements overlap. In addition, problems can arise when a project changes, releases new versions, scrapes old versions, or almost all of the code depends on other code to do its job. There are bound to be dependencies in a project, and those dependencies will change over time, they may or may not be necessary, they may run at a very low level, so we have to consider that they may not be portable across different operating systems or even within the same operating system. It’s already very complicated.

To make matters worse, your direct dependencies also have their own set of dependencies. What if your packages depend directly on A and B, and both depend on C? Which version of C should you install? Is it possible if A wants to install strict version 2 of C and B wants to install strict version 1 of C?

To a certain extent, control the chaos, people way of packaging design code, code package can be reused, installation, versioning and presents some descriptive metadata information, for example: “has been packaged on the Windows 64 – bit system”, or “applies only to macos system”, or “need to run this version or above”.

Okay, now I know the problem. So what’s the solution?

The first step is to define a deliverable entity that assembles the specified release of the specified software. This deliverable is what we call a package (or distribution in Python parlance). You can deliver in two ways:

Source code: The source code is packaged as a zip or tar.gz file and compiled by the user.
Binaries: You compile the code, then publish the compiled content, which the user can use directly without additional steps.

Both can be useful, and in general it’s a good idea to offer both. Of course, we need tools that can be packaged correctly, especially in order to accomplish the following tasks:

Create deliverable packages (that is, the build mentioned above)
Publish the package somewhere so others can get it
Download and install the package
Handle dependencies. What if package A needs package B to run? If package A does package B need to depend on how you use A? What if package A only needs package B when installed on Windows?
Define the running time. As mentioned earlier, it is often the case that even a small piece of software requires many dependencies to run, and these dependencies are best separated from the dependencies of other software. This should be true both when you’re developing and when you’re running.

Can you be more specific? What do I have to do before I write code?

Of course. Before you write code, you usually do the following:

Create a Python environment independent of system Python. This allows you to work on multiple projects simultaneously. And if you don’t, the contents of project A and project B may get mixed up.
If you want to specify your project’s dependencies, keep in mind that there are two ways to do this: the abstract way, where you simply specify which dependencies are required (e.g. Numpy), and the concrete way, where you must specify the version number (e.g. Numpy 1.1.0). Why this distinction exists will be explained in more detail later. If you want to create a working development environment, specify dependencies.
Now that you’ve done what you need to do, you can start developing.

Do I need to use any tools to do this?

It’s hard to say because there are so many tools and they are constantly changing. One option is that you can create a separate Python “virtual environment” using Python’s built-in Venv. PIP (also built into Python) is then used to install the dependent packages. Typing and installing one by one is too cumbersome, so people often write specific dependencies (hard-coded version numbers) into a file and tell PIP: “Read this file and install all packages written in the file.” PIP would have done it. This file is known as requirements.txt, which you may have seen in other projects.

Okay, but what exactly is a PIP?

PIP is a program used to download and install packages. If these packages also have dependencies, PIP will install these child dependencies as well.

How did PIP do it?

It finds the corresponding package by name and version number on the remote service Pypi and downloads and installs it. If the package is already a binary, you just need to install it. If it is source code, PIP is compiled and then installed. But PIP does more than that, because the package itself may have other dependencies, so it also gets those dependencies and installs them.

Why do you say using requirements.txt method is just an “option”?

This approach can become lengthy and complex as the project expands. For different platforms, you need to manually manage directly dependent versions. For example, if you need to install one package on Windows and another on Linux or some other system, you end up maintaining multiple files, such as win-requirements. TXT, linux-requirements. TXT, and so on.

You must also consider that some dependencies are necessary for your software to run; While others are just for running tests, these dependencies are required by the developer or CI device, but not by anyone else using your software, so they are no longer project dependencies at this point. Therefore, you need a new file, dev-requirements.txt.

The problem is that requirements.txt might only specify direct dependencies, but in practice you want to customize all the dependencies you need to create your environment. Why? For example, if you install A direct dependency on A, which in turn relies on VERSION 1.1 C. But one day C releases a new version 1.2, and from then on, when you create your environment, PIP will download the 1.2 version of C with possible vulnerabilities. So all of a sudden you don’t pass the test, and you don’t know why.

So you want to specify both dependencies and their children in requirements.txt. But then you can’t distinguish between the two dependencies in the file, so when a dependency goes wrong and you want to debug it, you have to figure out which one of the dependencies in the file is its child, and…

Now you get it. It’s a mess, and you don’t want to deal with it.

One of the problems you then face is that PIP can decide which version to install in a more primitive way, which can run itself into a dead end and present you with an environment that doesn’t work or an error. Remember the example: packages A and B both depend on C. So you need a more complex process, in which PIP is basically used only to download packages with defined versions, and the need to decide what version to install is left to another program that has a global perspective and can make more informed version decisions.

For example? Please give me an example.

An example is PipenV. It brings together VENv, PIP, and a bunch of other hacks, gives you a list of direct dependencies, and it does its best to sort out the messes mentioned above and deliver you a working environment. Poetry is another example. Both are often discussed, and some argue for human and policy reasons. But most people prefer Poetry.

Some companies such as Continuum and Enthought have their own version management (conda and EDM), and they generally avoid the complexity of dependent versions that are attached to different platforms. We won’t go into that here. Suffice it to say that if you want to use those dependencies that are already compiled or depend on compiled libraries, as is common in the case of scientific computing, then you’re better off using their systems to manage your environment, which will save you a lot of trouble. Because that’s what they’re good at.

So which is more useful, Pipenv or Poetry?

As I said, people tend to give Poetry a little bit more. I’ve tried both, and Poetry is better for me, providing a more compatible, superior solution.

Ok, so at least we’re going to use Poetry, which will set up the environment for us so I can install the dependencies and start programming.

That’s right. But I haven’t talked about building yet. That is, once you have the code, how do you create a release?

Yeah, so that’s where setup.py, SetupTools, and Distutils come in?

You could say that, but not exactly. Initially, when you want to create a source code or binary distribution, you need to use a standard library module called Distutils. The way to do this is to use a Python script called setup.py, which magically creates projects that you can deliver to others. The script can be named any way you want, but setup.py is the standard name, and other tools (such as the widely used PIP) will only look for files named that way. If PIP doesn’t find a buildable version to rely on, it will download the source code and build it, simply run setup.py, and hope for the best.

But Distutils don’t work well, so some people have found alternatives that can do a lot more than Distutils. Despite the challenges, the chaos, and the long road to development, SetupTools is better and accessible to everyone. Today setupTools still uses the setup.py file to give the impression that they haven’t changed and the process of creating the environment remains the same.

Why is it that we can only pray for the best?

PIP does not guarantee that the package it runs setup.py builds will actually run. It’s just a Python script that may have its own dependencies, and you can’t change its dependencies or trace them when something goes wrong. It’s a chicken-and-egg situation.

But setuptools.setup() requires the setup_requires option

This method is a pit, and you can hardly solve any problems with it. It’s still a chicken-and-egg situation. PEP 518 discusses this at length and concludes that it is crap. Don’t use.

So are setupTools and setup.py optional methods for build releases?

It used to be. But not necessarily now, but maybe sometimes. It depends on what you’re Posting. As it stands, no one wants SetupTools to be the only way to determine how packages are distributed. The root of the problem is a little deeper and involves technical issues, but if you’re curious, take a look at PEP 518. The most important part I mentioned above: if PIP wants to build its download dependencies, how does it decide which version to download and use to execute the Setup script? Yes, it can assume that it needs to rely on SetupTools, but that’s just assuming. Setuptools may not be needed in your environment, so how does PIP make that decision? More often than not, why use SetupTools instead of another tool?

Many times this dictates that anyone who wants to write their own package management tool should be able to do so, so you just need another configuration tool to define which package system to use and which dependencies you need to build your project.

Using pyproject. Toml?

Correct. More specifically, a subsection in which you define the “back end” that you use to build packages. If you want to use a different build back end, PIP can do it. If you don’t, PIP will assume you’re using distutils or setupTools, so it will retreat to the setup.py file and execute, let’s hope it builds successfully.

Will setup.py ever go away? Setuptools (distutils before it) uses setup.py to describe how to build. Other tools may use other methods. Perhaps they will depend on adding some content to PyProject.toml.

At the same time, you can finally specify dependencies to perform builds in PyProject.toml, eliminating the chicken-and-egg conundrum described above.

Why choose tomL format files? I’ve never even heard of it. Why not JSON, INI, or YAML?

Standard JSON does not allow comments. But people really need to rely on annotations to convey information about a project. You can go against the rules, but that’s not JSON. Additionally, JSON is somewhat anti-human, and not very enjoyable to write.

INI is not a standard at all, and it has many limitations.

YAML can be a potential security threat to your project, almost like a virus.

So it makes sense to choose TomL. But couldn’t they include SetupTools in the standard library?

Maybe, but the problem is that the standard library release cycle is really long. Distutils updates have been slow, which is inspiring the adoption and rise of SetupTools. But SetupTools is not guaranteed to meet all requirements. Some bags may have special requirements.

Ok, so am I correct to say that I need to use Poetry to create a work environment? Build packages using setup.py and setupTools, or PyProject.toml.

If you want to use SetupTools, you’ll need setup.py, but the problem you might run into is that other users will also need to install SetupTools to build your packages.

So what other tools can I use besides SetupTools?

You can say flit, or Poetry.

Does Poetry not require installation dependencies?

Yes, but it can also be used to build. Pipenv does not.

By the way, if I use setup.py, why do I have to specify dependencies? What does my setup.py download have to do with pipenv, Poetry, and requirements.txt?

These are the abstract dependencies needed to run the package, and the dependencies PIP needs when deciding which versions to download and install. You should relax the dependency restrictions here, because if you don’t… Remember when I said that both A and B depend on C? What if A asks: “I want version 1.2.1 of C”, but B asks: “I want version 1.2.2 of C”?

PIP has no other choice when it comes to building a source distribution of a downloaded resource. PIP does not retrieve the requirements you wrote in requirements.txt. It simply runs setup.py, which causes PIP to use setupTools and then call PIP again to resolve the abstract dependencies into concrete installable dependencies.

What about eggs, easy install,.egg-info directories, distribute, virtualenv (this is not venv), zc.buildout, bento?

Ignore them. They are either legacy tools or offshoots of other tools, or attempts that lead nowhere.

The Wheels?

Remember what I said earlier? PIP needs to know what resources to download from PYPI in order to download the correct version and operating system. A Wheel is a file that contains the resource to download and has special, specified fields that PIP uses to determine dependencies and child dependencies.

Wheels’ file names contain tags that act as metadata (such as PEP-0425), so when some resource (such as CPython) is compiled, Wheels can know the compiled version, ABI, and so on. Labels in file names have a standard layer, and specific words in metadata have specific meanings.

Remember, build wheels for binary distributions.

What about.pyz?

Just ignore it, it’s not strictly about packing. But there are other ways it can be useful, and if you want more detailed information, see PEP-441.

What about PyInstaller?

Pyinstaller is a completely different topic. You see, the problem with the word “pack” is that it doesn’t articulate what it really means. So far, we have discussed:

Create an environment where libraries can be developed
Build your project into a format that others can use

But these are usually applied to libraries. When it comes to publishing apps, things are different. When you package a library, you know it will be part of a larger project body. And when you package an application, that application is the larger body of the project.

Also, if you want to provide apps to people, you should specify the platform for the app. For example, you want to provide an executable file with ICONS, but on Windows, macOS, and Linux platforms, they should be different.

PyInstaller is a great tool to use when you want to create a standalone executable application. It creates the finished application for you on the user’s desktop. Packaging is about managing the network of dependencies, libraries, and tools you need to create an application that you may or may not use PyInstaller to create.

Note, however, that using this method assumes that your application is simple and self-contained. If the application needs to do more complicated things at installation time, such as creating a Windows login password, then you need a more suitable and sophisticated installer, such as NSIS. I don’t know if there is such a thing as NSIS in the Python world. But in any case, NSIS doesn’t know what you deployed. You can of course use PyInstaller to create an executable application, then deploy it using NSIS, and also perform additional requirements such as registry modifications or file system modifications to make the application work.

Ok, but how do I install projects that I already have resource packs for? Use Python setup.py?

Not right. Use PIP install., because this command guarantees that you can uninstall the application later and is generally better. PIP then checks PyProject.toml and runs the build in the background. If PIP does not find the PyProject.toml file, it will have to fall back to the old way and run setup.py to try to build.

I like this article very much, but I still have some questions that I am not clear about

You can make your own issue. If I knew the answer, I would give it to you right away. If I don’t know, I’ll do some research and get back to you as soon as possible. My goal with this article is for people to finally understand Python packaging.

Is there any reference link for me to learn more deeply?

Of course, see:

Sedimental.org/the_packagi…

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.

The state of Python packaging (written in 2019)

The state of Python packaging (written in 2019)

Related Posts

ThreadLocal Hash algorithm (about 0x61C88647)

The abstraction of object-oriented family

Dubbo+Spring (Notes + Maps + Tutorials)