This is the 22nd day of my participation in the First Challenge 2022.
Today then read cui teacher in front of the book has been studied with requests urllib library, and the basic usage of a regular expression, today learn a support HTTP / 2.0 protocol access library, HTTPX
Example shows
Spa16. Scrape. Center/this should be teacher Cui’s own site, do their own sample, highly recommended, this latest crawler book. So how do we know it’s using HTTP /2.0, and is Requests not available?
Let’s address the first problem HTTP /2.0
Open up your browser and check the elements, look under the network TAB,
Look at Protocol, the all-in-one H2 that enforces HTTPS /2.0. Let’s access a common one, like gold nuggets
You can see that there’s H2, there’s HTTP /1.1 okay, so we solved that problem and let’s try the Requests library can we request it?
import requests
r = requests.get("https://spa16.scrape.center/")
print(r.status_code)
print(r.text)
Copy the code
Get is an error message, I will not stick, run the above code can line. RemoteDisconnected(“Remote end closed connection without”
To solve the problem
Install HTTPX library
Pip3 install HTTPX note that your HTTPX still cannot access H2
pip3 install 'httpx[http2]'
Ok At this point we are ready to install
Basic usage
We’re happy to see that the usage API is the same as the Requests library, so let’s solve the problem of accessing that site
import httpx
r = httpx.get("https://spa16.scrape.center/")
print(r.text)
Copy the code
Results obtained:
That’s the problem. When you solve one problem, another problem pops up. The HTTPX library does not support H2 by default. The statement is as follows:
import httpx
# we have this line declaration
client = httpx.Client(http2=True)
Call the declared object to access it
r = client.get("https://spa16.scrape.center/")
print(r.text)
Copy the code
The new problem is what is the client object
As far as I understand, it is very similar to the object session,session is the reply,client is the client
The client object
This is the official way to write it
import httpx
with httpx.Client() as client:
r = client.get("https://www.httpbin.org/get")
print(r)
Copy the code
For more information :www.python-httpx.org/advanced/
OK, today I see page 78, set a small goal, this week I see page 150,2022-2-12