This is the 22nd day of my participation in the First Challenge 2022.

Today then read cui teacher in front of the book has been studied with requests urllib library, and the basic usage of a regular expression, today learn a support HTTP / 2.0 protocol access library, HTTPX

Example shows

Spa16. Scrape. Center/this should be teacher Cui’s own site, do their own sample, highly recommended, this latest crawler book. So how do we know it’s using HTTP /2.0, and is Requests not available?

Let’s address the first problem HTTP /2.0

Open up your browser and check the elements, look under the network TAB,

Look at Protocol, the all-in-one H2 that enforces HTTPS /2.0. Let’s access a common one, like gold nuggets

You can see that there’s H2, there’s HTTP /1.1 okay, so we solved that problem and let’s try the Requests library can we request it?

import requests

r = requests.get("https://spa16.scrape.center/")
print(r.status_code)
print(r.text)
Copy the code

Get is an error message, I will not stick, run the above code can line. RemoteDisconnected(“Remote end closed connection without”

To solve the problem

Install HTTPX library

Pip3 install HTTPX note that your HTTPX still cannot access H2

pip3 install 'httpx[http2]'

Ok At this point we are ready to install

Basic usage

We’re happy to see that the usage API is the same as the Requests library, so let’s solve the problem of accessing that site

import httpx

r = httpx.get("https://spa16.scrape.center/")

print(r.text)
Copy the code

Results obtained:

That’s the problem. When you solve one problem, another problem pops up. The HTTPX library does not support H2 by default. The statement is as follows:

import httpx

# we have this line declaration
client = httpx.Client(http2=True)
Call the declared object to access it
r = client.get("https://spa16.scrape.center/")

print(r.text)
Copy the code

The new problem is what is the client object

As far as I understand, it is very similar to the object session,session is the reply,client is the client

The client object

This is the official way to write it

import httpx

with httpx.Client() as client:
    r = client.get("https://www.httpbin.org/get")
    print(r)
Copy the code

For more information :www.python-httpx.org/advanced/

OK, today I see page 78, set a small goal, this week I see page 150,2022-2-12