• Better Performance by Optimizing Gunicorn Config
  • By Omar Rayward
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: shixi – li

Practical advice on how to configure Gunicorn

In summary, the number of clusters or cores should be increased for cpu-limited applications. However, pseudo threads should be used for I/O constrained applications.

Gunicorn is a Python WSGI HTTP server. It is usually located between a reverse proxy (such as Nginx) or load balancing (such as AWS ELB) and a Web application (such as Django or Flask).

Gunicorn architecture

Gunicorn implements a UNIX predistribution Web server.

Ok, so what does that mean?

  • Gunicorn starts a main thread to which it is distributed, and the resulting child thread is the corresponding worker.
  • The main process is used to ensure that the number of workers is the same as the number defined in the Settings. So if any worker dies, the main thread can be started separately by distributing itself.
  • The role of the worker is to handle HTTP requests.
  • This pre-in pre-release means that the worker is created before the main thread processes the HTTP request.
  • The kernel of the operating system is responsible for the load balancing between worker processes.

To improve performance when using Gunicorn, we must keep in mind three types of concurrency.

First type of concurrency (Workers mode, aka UNIX process mode)

Each worker is a UNIX process that loads a Python application. No memory is shared between workers.

The recommended number of workers is (2*CPU)+1.

For a dual-core (two cpus) machine, 5 is the recommended number of workers.

gunicorn --workers=5 main:app
Copy the code

Second concurrency (multi-threading)

Gunicorn also allows each worker to have multiple threads. In this scenario, the Python application is loaded once by each worker, and each thread generated by the same worker shares the same memory space.

To use multithreading in Gunicorn. We used threads mode. Each time we use threads mode, the worker’s class will be gThread:

gunicorn --workers=5 --threads=2 main:app
Copy the code

The previous command is equivalent to:

gunicorn --workers=5 --threads=2 --worker-class=gthread main:app
Copy the code

The maximum number of concurrent requests in our example is the worker * thread, which is 10.

The recommended maximum number of concurrency in worker and multi-threaded mode is still (2*CPU)+1.

So if we are using a quad-core (4 CPU) machine and we want to use workers and multi-threaded mode, we can use 3 workers and 3 threads to get a maximum of 9 concurrent requests.

gunicorn --workers=3 --threads=3 main:app
Copy the code

Third type of concurrency (” pseudo threads “)

There are Python libraries such as GEvent and Asyncio that enable multiple concurrency in Python. That is “pseudo threads” based on coroutines.

Gunicrn allows you to use these asynchronous Python libraries by setting the corresponding worker class.

This setting applies to the GEvent we want to run on a single-core machine:

gunicorn --worker-class=gevent --worker-connections=1000 --workers=3 main:app
Copy the code

Worker-connections is a special setting for the GEvent Worker class.

(2*CPU)+1 is still the recommended number of workers. Since we only have one core, we will use three workers.

In this case, the maximum number of concurrent requests is 3000. (3 workers * 1000 connections /worker)

Concurrency vs. parallelism

  • Concurrency refers to the simultaneous execution of two or more tasks, which may mean that only one of them is being processed while the others are paused.
  • Parallelization means that two or more tasks are being executed simultaneously.

In Python, threads and pseudo-threads are a way to be concurrent, but not parallel. But workers are a series of ways based on concurrency or parallelism.

That’s great in theory, but how do I use it in a program?

The actual case

By tweaking the Gunicorn Settings, we wanted to optimize application performance.

  1. If this application isThe I/O constrained, you can usually get best performance by using “pseudo threads” (gEvent or asyncio). As we’ve learned, Gunicorn passes the appropriate SettingsThe worker classAnd will beworkersQuantity adjusted to(2*CPU)+1To support this programming paradigm.
  2. If this application isThe CPU is limitedIt does not matter how many concurrent requests the application handles. The only thing that matters is the number of parallel requests. becausePython’s GIL, threads and “pseudo threads” cannot execute in parallel mode. The only way to achieve parallelism is to add **workers** quantity to suggested(2*CPU)+1, understanding that the maximum number of parallel requests is actually the number of cores.
  3. If you are unsure of your application’s memory usage, using multithreading and the corresponding gThread Worker class will result in better performance because the application will be loaded once on each worker and each thread running on the same worker will share some memory. But this requires some additional CPU consumption.
  4. If you don’t know what you should choose for yourself, start with the simplest configuration, justworkersQuantity set to(2*CPU)+1And it doesn’t mattermultithreading. From this point on, it is the baseline environment for all tests and errors. If the bottleneck is in memory, start introducing multithreading. If the bottleneck is on I/O, consider using a different Python programming paradigm. If the bottleneck is on the CPU, consider adding more cores and tuningworkersThe number.

Build system

We software developers often assume that every performance bottleneck can be solved by optimizing the application code, but this is not always the case.

Sometimes tweaking the HTTP server Settings, using more resources, or redesigning the application with another programming paradigm are solutions to improve application performance.

In this case, building the system means understanding the types of computing resources (processes, threads, and “pseudo-threads”) that we should apply flexibly to deploy high-performance applications.

By using the right understanding, architecture, and implementation of the right technical solutions, we can avoid the trap of trying to improve performance by optimizing the application code.

reference

  1. Gunicorn is from RubyUnicornThe project was transplanted. It’sOutline of the designHelps clarify some of the most basic concepts.Gunicorn architectureFurther solidifying some of these concepts.
  2. Blog post with attitudeSome of the key features are very well represented in Unix.
  3. Answer on Stack Overflow about the pre-distributed Web services model.
  4. Some of theMore and morereferenceTo understand how to fine-tune the Gunicorn.

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.