preface

Public number: JavaCodes contains more quality blog, welcome friends to pay attention!!

App data packet capture analysis

Open the Bean fruit food APP

Get the corresponding JSON data

The corresponding code

    url = "https://api.douguo.net/recipe/flatcatalogs"
    data = {
        "client": "4,"."_vs": "0",
    }
    count = 0
    response = handle_request(url, data)
    # Convert to JSON
    index_response_dict = json.loads(response.text)
Copy the code

Using an online JSON parsing site to parse, we can see that we have the data we need

Then we enter braised pork in brown sauce 😁, found that there are three sorts of way

We can find three corresponding HTTPS requests in Fiddler

They look exactly the same on the surface, but all three are POST requests, so the parameters are different. In my practice, I find that the three categories correspond to different values for the three Order fields

Take a look at the specific JSON data, you can see that there is a one-to-one correspondence

Corresponding code

caipu_list_url = "https://api.douguo.net/recipe/v2/search/0/20"
caipu_list_response = handle_request(url=caipu_list_url, data=data)
caipu_list_response_dict = json.loads(caipu_list_response.text)
Copy the code

Then you need to go to the details page

The number in the request path is the ID you got above

Corresponding code

detail_url = "https://api.douguo.net/recipe/v2/detail/" + str(shicai_id)
detail_data = {
    "client": "4"."author_id": "0"."_vs": "11104"."_ext": '{"query":{"kw":' + str(
        shicai) + ',"src":"11104","idx":"3","type":"13","id":' + str(
        shicai_id) + '}} '."is_new_user": "1",
}
detail_response = handle_request(detail_url, detail_data)
# Parse to JSON format
detail_response_dict = json.loads(detail_response.text)
Copy the code

The complete code

import requests
import json
import pymysql
from multiprocessing import Queue

# Create queue
queue_list = Queue()

headers = {
    "client": "4"."version": "7008.2"."device": "SM-G973N"."sdk": "22,5.1. 1"."channel": "qqkp"."resolution": "1280 * 720"."display-resolution": "1280 * 720"."dpi": "1.5"."pseudo-id": "b2b0e205b84a6ca1"."brand": "samsung"."scale": "1.5"."timezone": "28800"."language": "zh"."cns": "2"."carrier": "CMCC"."User-Agent": "Mozilla / 5.0 (Linux; Android 5.1.1. MI 9 Build/NMF26X; Wv) AppleWebKit/537.36 (KHTML like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36"."act-code": "1626316304"."act-timestamp": "1626316305"."uuid": "12697ae9-66dd-4071-94e5-778c10ce6dd1"."battery-level": "1.00"."battery-state": "3"."bssid": "82:06:3A:49:9E:44"."syscmp-time": "1619000613000"."rom-version": "beyond1qlteue-user 5.1.1 PPR1.190810.011 500210421 release-keys"."terms-accepted": "1"."newbie": "1"."reach": "1"."app-state": "0"."Content-Type": "application/x-www-form-urlencoded; charset=utf-8"."Accept-Encoding": "gzip, deflate"."Connection": "Keep-Alive"."Host": "api.douguo.net",
}

sql = Insert into table name (shicai,user_name,caipu_name,describes,zuoliao_list,tips,cook_step) values(%s,%s,%s,%s,%s)


def handle_request(url, data) :
    response = requests.post(url=url, headers=headers, data=data)
    return response


# Request home page
def handle_index() :
    url = "https://api.douguo.net/recipe/flatcatalogs"
    data = {
        "client": "4,"."_vs": "0",
    }
    count = 0
    response = handle_request(url, data)

    # Convert to JSON
    index_response_dict = json.loads(response.text)

    for index_item in index_response_dict['result'] ['cs'] :for index_item_1 in index_item['cs'] :if count > 5:
                return
            for item in index_item_1['cs']:
                item_data = {
                    "client": "4"."keyword": item['name']."order": "3"."_vs": "400",
                }
                queue_list.put(item_data)
            count += 1


def handle_caipu_list(data) :
    print("Currently processed ingredients:", data['keyword'])
    caipu_list_url = "https://api.douguo.net/recipe/v2/search/0/20"
    caipu_list_response = handle_request(url=caipu_list_url, data=data)
    caipu_list_response_dict = json.loads(caipu_list_response.text)


    for item in caipu_list_response_dict['result'] ['list']:
        shicai = data['keyword']
        user_name = item['r'] ['an']
        shicai_id = item['r'] ['id']
        describes = item['r'] ['cookstory'].replace("\n"."").replace(""."")
        caipu_name = item['r'] ['n']
        zuoliao_list = item['r'] ['major']
        detail_url = "https://api.douguo.net/recipe/v2/detail/" + str(shicai_id)
        detail_data = {
            "client": "4"."author_id": "0"."_vs": "11104"."_ext": '{"query":{"kw":' + str(
                shicai) + ',"src":"11104","idx":"3","type":"13","id":' + str(
                shicai_id) + '}} '."is_new_user": "1",
        }
        detail_response = handle_request(detail_url, detail_data)
        detail_response_dict = json.loads(detail_response.text)
        tips = detail_response_dict['result'] ['recipe'] ['tips']
        cook_step = detail_response_dict['result'] ['recipe'] ['cookstep']

        print("The current recipe in storage is:", caipu_name)

        Execute insert statement
        cur.execute(sql, (shicai, user_name, caipu_name, describes, str(zuoliao_list), tips, str(cook_step)))



def init_mysql() :
    dbparams = {
        'host': '127.0.0.1'.'port': 3306.'user': 'Username'.'password': 'password'.'database': 'Database name'.'charset': 'utf8'
    }
    conn = pymysql.connect(**dbparams)
    cur = conn.cursor()
    return conn, cur


def close_mysql(conn, cur) :
    cur.close()
    conn.close()



if __name__ == '__main__':

    Mysql > initialize mysql
    conn, cur = init_mysql()

    handle_index()

    while queue_list.qsize() > 0:
        handle_caipu_list(queue_list.get())
        # Commit transaction
        conn.commit()



    close_mysql(conn,cur)

Copy the code

Crawl results

Code test, only part of the crawl

The last

I am aCode pipi shrimpI am a mantis shrimp lover who loves to share knowledge. I will keep updating my blog in the future. I look forward to your attention!!

Creation is not easy, if this blog is helpful to you, I hope you can == a key three! ==, thanks for your support, see you next time ~~~

Share the outline

Big factory interview questions column

Python Crawler Column

This crawler source code is available from GitHub github.com/2335119327/… Has been included (meaning more crawlers not in this blog, interested partners can have a look), will continue to update, welcome Star.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

[App Crawler Path] : Massive recipe data crawler persistence | Python theme month

preface

App data packet capture analysis

The complete code

Crawl results

Share the outline

[App Crawler Path] : Massive recipe data crawler persistence | Python theme month

preface

App data packet capture analysis

The complete code

Crawl results

Share the outline

Related Posts

Log4j2 Zero Day Vulnerability Apache Flink guide

I finally figured out the string application scenario for redis data structures

Spring Cloud’s most powerful registry Nacos!