This is the second day of my participation in the August More text Challenge. For details, see: August More Text Challenge.

Hello, I’m Milo, a test fan!

If you want to have more communication with the author after reading, you can click to read the original article to find the bottom comment area, give the author a message!

Welcome everyone to pay attention to my public number: test development pit goods.

(If there are mistakes in this article, please send a message in the public number to correct, there are mistakes I deleted so as not to mislead everyone!)

review

Last time we had a whole set of processes, data constructors, today we’re going to introduce AIOHTTP.

Before that, let’s briefly say the preliminary knowledge:

Python’s asynchronous

In fact, WHEN I wrote this article, I was struggling, because I didn’t really know how to do it. Although I’ve read a lot of articles, I still feel like writing asynchronous code is a bit of a struggle.

You can search for some asynchronous articles, but let me say that I'm not quite clear.

Reference:

  • Asynchronous I/o: www.liaoxuefeng.com/wiki/101695…

Here’s a vivid example

Like buying breakfast and packing it home. I’d like to have a bowl of noodles in soup, a cup of soybean milk and two pancakes in the morning. It is a pity that although these three stalls are together, they all have to queue up.

Python synchronization scenario:

  1. I’ll buy soymilk and wait for it to be cooked.
  2. I buy noodles in soup and wait for them to be ready.
  3. I’ll buy the oilcakes and wait for them to be ready.

These are synchronized tasks, and I have to finish one before I can move on to the next. The total elapsed time is the sum of the three steps.

Python asyncio looks something like this:

  1. I first told my boss I wanted a bowl of noodles in soup. At that time, the boss was making or waiting in line, but he was already dealing with my demand
  2. I’ll tell the guy at the bun shop another cup of soy milk
  3. Finally, go to the oil shop and tell the boss to ask for two oil cakes

If one of them is ready, I will be notified to get it, so I will be alone, but I will be more productive. The total time elapsed is just the time of switching within a thread + the slowest time.

Regular multithreaded scenario:

  1. My brother is going to buy soymilk for me until it’s ready.
  2. My sister is going to buy me some noodle soup and wait for it to be ready.
  3. I’ll buy my own pancakes and wait for them to be ready.

It’s like sending three people to do three things. There is no relationship between everything, and the total time spent is the slowest one.

So let’s focus on what asyncio does, but it actually has an event_loop, which you can think of as something that manages events in the thread, and it helps us switch things.

For example, said above, with the boss said to buy soymilk, the boss in that busy, at this time you should give resources, switch to the next event (to buy noodle soup), finally soymilk done, how do you know to take? It’s also the event_loop that notifying you so you can go back and get your soy milk.

We don’t need to know how to switch events yet, we just need to tell event_loop that our buying soy milk is an asynchronous method, in which we can allocate resources to other events while waiting for the boss to make the bun.

Be familiar with async and await

When we add the async keyword to a method, it becomes an asynchronous method, and our await keyword can only be used in async methods, meaning that it gives up control over the current thread and allows the thread to execute other content.

In general, if we write our own methods, it doesn’t make sense to add async to the method and await the method when calling it. For example, if we use requests, which are full of synchronous methods, it doesn’t matter if we give async and await, so we need an asynchronous HTTP request library.

In the example above, if I go to the pancake stand and the owner says not to leave, it’s ready. Well, then I’ll just stay there, buy the fritters and go to the next place.

I don’t know if that makes sense.

aiohttp

Aiohttp is an asynchronous HTTP framework, which you can think of as asynchronous requests. We have a lot of these libraries like Aiofile, which is a modified asynchronous library, and there will be more asynchronous libraries in the future.

Why do we bother? Did Request stop smelling good? Not really. Python has long been criticized for its slow execution, a fact I learned when I used it at my last company.

Once a problem

There was a requirement to grab all the data of the company’s services from someone else’s interface and write it to our local database. It’s like thousands of apps, and syncing them at once is very slow, so writing pseudocode is like:

for i in range(100):
  r = requests.get("xxx")
  data = r.json()
  cursor.execute("insert into xxx....")
Copy the code

The main reason for the slowness is that the HTTP process is time-consuming, and while we are waiting for a response, other tasks are not running at the same time.

It’s like there are 100 acres of land, only one person planted, wait for him to finish the field will be monkey years horse months. Some people say, well, why don’t you use threading instead of threading, let’s say you can use more people, like 100 people, all at once?

I would like to, but GIL can not allow it. Although I did not do it myself, the outcome should be predictable. If you are interested, I may do such a test in the future.

So people say, well, how did you do it? In fact, I used go to adjust the code, and dozens of Goroutines to complete this thing. The speed was about hundreds of times faster, and the synchronization was finished in a short time.

Try aiohttp

Thinking of the above problems, I also remembered that I would run the use cases in batches in the future. So I was heartbroken and decided to give AIOHTTP a try.

Before that, I had only heard of it by name, but had no actual experience.

Relevant document: docs.aiohttp.org/en/stable/

Retrofit the previous httpclient.py

Let’s create a new file called asynchttpClient.py.

  • Write the constructor

Almost as before, there is an extra timeout, because the TIMEOUT of AIOHTTP is not int/float, but aiOhttp.clientTimeout.

  • Write the get_cookie method
def get_cookie(self, session) :
    cookies = session.cookie_jar.filter_cookies(self.url)
    return {k: v.value for k, v in cookies.items()}
Copy the code

Get the cookie from the cookie_jar in the session.

  • Write the method to get resP

  • Write the collect method

    In essence, this method assembles some of the requested data.

  • Write the Invoke method

    (For copycat reasons, I’ve seen this used in many RPC frameworks)

So the first thing I do is I get a session with async with, and notice I’m wearing a Cookie here. There is no difference elsewhere, and the request time is calculated extra.

How to use

First look at our HTTP test interface, change the method to async, change the request part to AsyncRequest, and await what the invoke returns.

Let’s try request

Small test code

import asyncio
import time

import aiohttp
import requests

url = "https://www.baidu.com"


async def fetchBaidu() :
    async with aiohttp.ClientSession() as session:
        resp = await session.get(url)
        text = await resp.text(encoding='utf-8')
        print(text.split("\n") [0])


async def main() :
    start = time.time()
    await asyncio.gather(*(fetchBaidu() for _ in range(200)))
    print("Take time :", time.time() - start)


def main2() :
    start = time.time()
    session = requests.Session()
    for i in range(200):
        r = session.get(url)
        print(r.text.split("\n") [0])
    print("Take time :", time.time() - start)


if __name__ == "__main__":
    asyncio.run(main())
    # main2()

Copy the code

Currently 200 tests, this is the speed of AIOHTTP:

Recycling too much, Baidu is a little unhappy.

Today’s content to share here, you can take this script to play, have a try, see where THE AIOHTTP cock!