The request function
- Simulating web Surfing through a browser
The installation
pip install requests
Copy the code
process
- The specified url
- The initiating
- Get reactive data
- Persistent storage
Case 1: Climb sogou home page
import requests
url = "https://www.sogou.com/"
response = requests.get(url=url)
# text returns the response data as a string
page_text = response.text
print(page_text)
with open("./sougou2021.html"."w",encoding="utf-8") as fp:
fp.write(page_text)
Copy the code
Case 2: Make a simple web collector
Ua camouflage
key = input('enter a key word:')
# Parameter dynamic
params = {'query':key}
# UA camouflage
headers = {
'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}
url = 'https://www.sogou.com/web'
response = requests.get(url=url,params=params,headers=headers)
page_text = response.text
fileName = key+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
Copy the code
Case 3: Get douban movies
Dynamically loading data
Get: URL, request mode, request parameters, and request header information
import requests
headers = {
'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}
url = 'https://movie.douban.com/j/chart/top_list'
for i in range(1.30):
params = {
'type': i,
'interval_id': '100:90'.'action':' ' ,
'start': '0'.'limit': '20',
}
json_data = requests.get(url=url,headers=headers,params=params).json()
print(json_data)
Copy the code
Case 4 Post request operation
The post request uses data instead of params (form data for this page).
import requests
headers = {
'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
}
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
for pageNum in range(1.6):
data = {
'cname': ' '.'pid': ' '.'keyword': 'Shanghai'.'pageIndex': pageNum,
'pageSize': '10',
}
json_data = requests.post(url=url,headers=headers,data=data).json()['Table1']
for dic in json_data:
print(dic['addressDetail'])
Copy the code
Case 5: Glory crawl
Form data is a dictionary format and takes JSON instead of data
import requests
headers = {
'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',}# Obtain store ID in batch
main_url = 'https://openapi.vmall.com/mcp/offlineshop/getShopList'
data = {"portal":2."lang":"zh-CN"."country":"CN"."brand":1."province":"Beijing"."city":"Beijing"."pageNo":1."pageSize":20}
# Unpack the smallest dictionary unit with the shop_ID information needed under the newspaper
json_data = requests.post(main_url,headers=headers,json=data).json()['shopInfos']
url = 'https://openapi.vmall.com/mcp/offlineshop/getShopById'
for dic in json_data:
shop_id = dic['id']
params = {
'portal':'2'.'version': '10'.'country': 'CN'.'shopId': shop_id,
'lang': 'zh-CN',
}
json_data = requests.get(url,headers=headers,params=params).json()
print(json_data)
Copy the code
Json_data is a dictionary data, where the shopInfo key corresponds to a list, and the list element is a list of dictionaries => dictionary sets list sets dictionary