preface
Public number: JavaCodes contains more quality blog, welcome friends to pay attention!!
App data packet capture analysis
Open the Bean fruit food APP
Get the corresponding JSON data
The corresponding code
url = "https://api.douguo.net/recipe/flatcatalogs"
data = {
"client": "4,"."_vs": "0",
}
count = 0
response = handle_request(url, data)
# Convert to JSON
index_response_dict = json.loads(response.text)
Copy the code
Using an online JSON parsing site to parse, we can see that we have the data we need
Then we enter braised pork in brown sauce 😁, found that there are three sorts of way
We can find three corresponding HTTPS requests in Fiddler
They look exactly the same on the surface, but all three are POST requests, so the parameters are different. In my practice, I find that the three categories correspond to different values for the three Order fields
Take a look at the specific JSON data, you can see that there is a one-to-one correspondence
Corresponding code
caipu_list_url = "https://api.douguo.net/recipe/v2/search/0/20"
caipu_list_response = handle_request(url=caipu_list_url, data=data)
caipu_list_response_dict = json.loads(caipu_list_response.text)
Copy the code
Then you need to go to the details page
The number in the request path is the ID you got above
Corresponding code
detail_url = "https://api.douguo.net/recipe/v2/detail/" + str(shicai_id)
detail_data = {
"client": "4"."author_id": "0"."_vs": "11104"."_ext": '{"query":{"kw":' + str(
shicai) + ',"src":"11104","idx":"3","type":"13","id":' + str(
shicai_id) + '}} '."is_new_user": "1",
}
detail_response = handle_request(detail_url, detail_data)
# Parse to JSON format
detail_response_dict = json.loads(detail_response.text)
Copy the code
The complete code
import requests
import json
import pymysql
from multiprocessing import Queue
# Create queue
queue_list = Queue()
headers = {
"client": "4"."version": "7008.2"."device": "SM-G973N"."sdk": "22,5.1. 1"."channel": "qqkp"."resolution": "1280 * 720"."display-resolution": "1280 * 720"."dpi": "1.5"."pseudo-id": "b2b0e205b84a6ca1"."brand": "samsung"."scale": "1.5"."timezone": "28800"."language": "zh"."cns": "2"."carrier": "CMCC"."User-Agent": "Mozilla / 5.0 (Linux; Android 5.1.1. MI 9 Build/NMF26X; Wv) AppleWebKit/537.36 (KHTML like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36"."act-code": "1626316304"."act-timestamp": "1626316305"."uuid": "12697ae9-66dd-4071-94e5-778c10ce6dd1"."battery-level": "1.00"."battery-state": "3"."bssid": "82:06:3A:49:9E:44"."syscmp-time": "1619000613000"."rom-version": "beyond1qlteue-user 5.1.1 PPR1.190810.011 500210421 release-keys"."terms-accepted": "1"."newbie": "1"."reach": "1"."app-state": "0"."Content-Type": "application/x-www-form-urlencoded; charset=utf-8"."Accept-Encoding": "gzip, deflate"."Connection": "Keep-Alive"."Host": "api.douguo.net",
}
sql = Insert into table name (shicai,user_name,caipu_name,describes,zuoliao_list,tips,cook_step) values(%s,%s,%s,%s,%s)
def handle_request(url, data) :
response = requests.post(url=url, headers=headers, data=data)
return response
# Request home page
def handle_index() :
url = "https://api.douguo.net/recipe/flatcatalogs"
data = {
"client": "4,"."_vs": "0",
}
count = 0
response = handle_request(url, data)
# Convert to JSON
index_response_dict = json.loads(response.text)
for index_item in index_response_dict['result'] ['cs'] :for index_item_1 in index_item['cs'] :if count > 5:
return
for item in index_item_1['cs']:
item_data = {
"client": "4"."keyword": item['name']."order": "3"."_vs": "400",
}
queue_list.put(item_data)
count += 1
def handle_caipu_list(data) :
print("Currently processed ingredients:", data['keyword'])
caipu_list_url = "https://api.douguo.net/recipe/v2/search/0/20"
caipu_list_response = handle_request(url=caipu_list_url, data=data)
caipu_list_response_dict = json.loads(caipu_list_response.text)
for item in caipu_list_response_dict['result'] ['list']:
shicai = data['keyword']
user_name = item['r'] ['an']
shicai_id = item['r'] ['id']
describes = item['r'] ['cookstory'].replace("\n"."").replace(""."")
caipu_name = item['r'] ['n']
zuoliao_list = item['r'] ['major']
detail_url = "https://api.douguo.net/recipe/v2/detail/" + str(shicai_id)
detail_data = {
"client": "4"."author_id": "0"."_vs": "11104"."_ext": '{"query":{"kw":' + str(
shicai) + ',"src":"11104","idx":"3","type":"13","id":' + str(
shicai_id) + '}} '."is_new_user": "1",
}
detail_response = handle_request(detail_url, detail_data)
detail_response_dict = json.loads(detail_response.text)
tips = detail_response_dict['result'] ['recipe'] ['tips']
cook_step = detail_response_dict['result'] ['recipe'] ['cookstep']
print("The current recipe in storage is:", caipu_name)
Execute insert statement
cur.execute(sql, (shicai, user_name, caipu_name, describes, str(zuoliao_list), tips, str(cook_step)))
def init_mysql() :
dbparams = {
'host': '127.0.0.1'.'port': 3306.'user': 'Username'.'password': 'password'.'database': 'Database name'.'charset': 'utf8'
}
conn = pymysql.connect(**dbparams)
cur = conn.cursor()
return conn, cur
def close_mysql(conn, cur) :
cur.close()
conn.close()
if __name__ == '__main__':
Mysql > initialize mysql
conn, cur = init_mysql()
handle_index()
while queue_list.qsize() > 0:
handle_caipu_list(queue_list.get())
# Commit transaction
conn.commit()
close_mysql(conn,cur)
Copy the code
Crawl results
Code test, only part of the crawl
The last
I am aCode pipi shrimpI am a mantis shrimp lover who loves to share knowledge. I will keep updating my blog in the future. I look forward to your attention!!
Creation is not easy, if this blog is helpful to you, I hope you can == a key three! ==, thanks for your support, see you next time ~~~
Share the outline
Big factory interview questions column
Python Crawler Column
This crawler source code is available from GitHub github.com/2335119327/… Has been included (meaning more crawlers not in this blog, interested partners can have a look), will continue to update, welcome Star.