Python has an impression of being easy to fetch web pages, and it’s the URllib and Requests modules that provide this productivity.
Urlib introduction
Urllib. request provides an urlopen function to retrieve the page. Supports different protocols, basic authentication, cookies, and proxies. There are two versions of urllib urllib and urllib2. Urllib2 can accept Request objects, urllib can only accept urls. Urllib provides the urlencode function to transcode the parameters of the GET request. Urllib2 has no corresponding function. Urllib throws a URLError and an HTTPError to handle client and server exceptions.
Requests to introduce
Requests is an easy-to-use HTTP library written in Python. This library allows us to complete HTTP requests with simple parameters, rather than having to specify parameters ourselves like urllib. At the same time, it can automatically transcode the response to Unicode, and has rich error handling capabilities.
- International Domains and URLs
- Keep-Alive & Connection Pooling
- Sessions with Cookie Persistence
- Browser-style SSL Verification
- Basic/Digest Authentication
- Elegant Key/Value Cookies
- Automatic Decompression
- Unicode Response Bodies
- Multipart File Uploads
- Connection Timeouts
- .netrc support
- List item
- Python 2.6 — 3.4
- Thread-safe
Here is some sample code in Python 3.6.0
Request a single page directly without parameters
import urllib
from urllib.request import request
from urllib.urlopen import urlopen
# import urllib2
import requests
Use urllib
response = urllib.request.urlopen('http://www.baidu.com')
# read() reads the raw data returned by the server and is transcoded after decode()
print(response.read().decode())
Get requests from requests
# request module
resp = requests.get('http://www.baidu.com')
print(resp)
print(resp.text)
Copy the code
HTTP is based on request and response. Urllib. request provides a request object to represent the request, so the above code could be written the same way
req = urllib.request.Request('http://www.baidu.com')
with urllib.request.urlopen(req) as response:
print(response.read())
Copy the code
The Request object can add header information
req = urllib.request.Request('http://www.baidu.com')
req.add_header('User-Agent'.'the Mozilla / 6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with urllib.request.urlopen(req) as response:
print(response.read())
Copy the code
Or pass the header directly to the Request builder.
GET request with parameters
A request with parameters is essentially the same as the above example in that the URL request string can be spelled out before the request is made. This example uses Tencent’s stock API, which can pass in different stock codes and dates to query the price and trading information of corresponding stocks at the corresponding time.
Use interface access with parameters
tencent_api = "http://qt.gtimg.cn/q=sh601939"
response = urllib.request.urlopen(tencent_api)
# read() reads the raw data returned by the server and is transcoded after decode()
print(response.read())
resp = requests.get(tencent_api)
print(resp)
print(resp.text)
Copy the code
Send a POST request
Urllib doesn’t have a separate function to distinguish between GET and POST requests, just whether the Request object has a data argument passed in.
import urllib.parse
import urllib.request
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord'.'location' : 'Northampton'.'language' : 'Python' }
data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
the_page = response.read()
Copy the code
References: 1, Python3 urllib.request network request operation 2, Python3 learning notes (use of urllib module) 3, Python simulation login methods 4, What are the differences between the urllib, urllib2, and requests module? Python3 URllib and requests modules