Introduction:

If you want to dump all the videos of Douyin VLogger that you are particularly interested in, how do you do it? How to use Python to export all video information for a specific user

Caught analysis

  • Chrome Deveploer Tools Chrome developer Tools

In Douyin APP, copy the vLogger home page address, for example: http://v.douyin.com/kGcU4y/, with chrome in PC clock in, and analog phones, choose the iPhone here, and then copy of the home page address, in a browser to access, Page jump to https://www.iesdouyin.com/share/user/110677980134

  • Pull down the home page and select Network=> the XHR TAB and see a similar request
:authority: www.iesdouyin.com :method: GET :path: /web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1561112910000&aid=1128&_signature=3Xf-nxAQgGfU O4SKisB.Ld13.o&dytk=061ae6e81229e178146aa674327eba89 :scheme: https accept: application/json accept-encoding: gzip, deflate, br accept-language: zh-CN,zh; Q = 0.9, en. Q = 0.8, ja. Q = 0.7, useful - TW; Q = 0.6, da; Q = 0.5 cookies: tt_webid = 6690145457198417412; _ga = GA1.2.605400954.1557670882; _ba = BA0.2-20181226-5199 - e - GIJXgXk9ajNkyFhmv7Wy; The user-agent _gid = GA1.2.1914501522.1562857517 referer: https://www.iesdouyin.com/share/user/110677980134: Mozilla / 5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, Like Gecko) Version/11.0 Mobile/15A372 Safari/ 604.1x-requested-with: XMLHttpRequestCopy the code

Return data Screenshot

By analyzing the URL of the Ajax request https://www.iesdouyin.com/web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1559299764000&aid=112 8 & _signature = 3 xf – nxAQgGfUO4SKisB. Ld13. O&dytk = 061 ae6e81229e178146aa674327eba89 get request parameters mainly include:

field type instructions
user_id int ID of the Douyin account
count int The default value 21 is used for the number of columns returned
max_cursor int The cursor of the request, each with the MAX_CURSOR returned by the previous request
aid int Use the default value 11128
_signature string Parameter signatures for each request
dytk string One parameter per request

Method of obtaining parameters:

  • User_id is redirected to the URL (https://www.iesdouyin.com/share/user/110677980134) to obtain user_id
  • Dytk includes this parameter in the source code of the home page, the snippet of the source code of the page
(function() {$(function(){
        __M.require('douyin_falcon:page/reflow_user/index').init({
            uid: "110677980134",
            dytk: '061ae6e81229e178146aa674327eba89'}); }); }) ();Copy the code

This parameter is obtained through the re

  • _signature is complicated to obtain. Douyin confuses the JS code in the front end, so it is difficult to directly analyze the algorithm process. However, it can execute the signature algorithm code and return the corresponding signature result.
  • You can use NodeJS or Selenium WebDriver to execute JS code. Selenium WebDriver is recommended. The EXECUTION environment of NodeJS is different from that of the browser, and the calculated signature result cannot be verified. Selenium WebDriver can invoke the local browser, and the calculated signatures can be the same as those calculated by the browser’s direct access access.
  • After formatting the JS code,Click to viewExecute the js method_bytedAcrawler.sign("110677980134")Sign the parameters

Code export home video list

def get_user_video_list_by_uid(user_id, cursor=0):
    url = 'https://www.iesdouyin.com/web/api/v2/aweme/post/?'
    sign, dytk = signature(user_id)
    tk_logger.info("sign:%s,dytk:%s" % (sign, dytk))
    if sign is None or dytk is None:
        tk_logger.log("sign [%s] or dytk [%s] is none" % (sign, dytk))
        return None
    headers = dict_merge(CHROME_HEADER, {
        "Accept": "application/json"."X-Requested-With": "XMLHttpRequest",
    })
    params = {
        "user_id": user_id,
        "count": "21"."max_cursor": cursor,
        "aid": "1128"."_signature": sign,
        "dytk": dytk
    }
    res = requests.get(url, headers=headers, params=params)
    tk_logger.info("request url: %s" % res.url)
    content = res.content.decode("utf8")
    jsn = json.loads(content)
    return jsn
Copy the code

Get the video list information

Get the video message code snippet

def get_video_detail_by_id(video_id):
    url = "Https://aweme-hl.snssdk.com/aweme/v1/aweme/detail/?version_code=6.5.0&pass-region=1&pass-route=1&js_sdk_version=1.16.2. 7 & app_name = aweme&vid = 9 d5f078e f64 A1A9-4-81 - c7 - F89CA6A3B1DC & app_version = 6.5.0 & device_id = 34712926793 & channel = % 20 App store & MC c_mnc=46011&aid=1128&screen_width=750&openudid=263bd93f02801d126ca004edccbff8f6e1b19f51&os_api=18&ac=WIFI&os_version=12. 3.1 & device_platform = iphone&build _number = 65014 & device_type = iPhone9, 1 & iid = 74239983401 & idfa = b4f F39B285A - 4-4874-9 d7e - C728A89 2BF6D"
    data = {"aweme_id": video_id}
    headers = {
        "sdk-version": "1"."x-Tt-Token": "00fc1e7950db67b5f43a312e9265cdfee513ea70c36d918c871f3bb553347f3db50ffca143b8722327b345816a75efca071d"."User-Agent": "Aweme 6.5.0 rv: 65014 (the iPhone; IOS 12.3.1; en_CN) Cronet"."Content-Type": "application/x-www-form-urlencoded"."Cookie": "tt_webid=6636348554880222728; __tea_sdk__user_unique_id=6636348554880222728; odin_tt=76d9b82d6e6f2ddfc99719a5b5d44a7d703cf977f0f7bddf8537f93920d57cb9ec33162ee47868b760f6b09e69209bb2f90bad220b75678a f850a0dfa9f056e2; install_id=74239983401; ttreq=1$dab0516952a4157c0c11d4993533c09d6e45fc94; sid_guard=fc1e7950db67b5f43a312e9265cdfee5%7C1559955316%7C5184000%7CWed%2C+07-Aug-2019+00%3A55%3A16+GMT; uid_tt=0afcb06309f632d872799ec0ac3b2c80; sid_tt=fc1e7950db67b5f43a312e9265cdfee5; sessionid=fc1e7950db67b5f43a312e9265cdfee5"."X-Khronos": "1559956401"."X-Gorgon": "8300000000002e40eee38cad71d14037bd1385d18bc973f094f5",
    }
    ret = {}
    res = requests.post(url, data=data, headers=headers)
    if res.status_code == 200:
        # tk_logger.info("video detail raw:%s" % res.content.decode("utf8"))
        jsn = json.loads(res.content)
        detail = jsn.get("aweme_detail". {}) video_info = get_video_info(detail) user_info = get_user_info(detail) play_addr = get_play_address(detail) video_cover = get_video_cover(detail) ret["video_info"] = video_info
        ret["user_info"] = user_info
        ret["play_addr"] = play_addr
        ret["video_cover"] = video_cover
    else:
        raise TKException("get video detail failed [%s][%d]" % (url, res.status_code))
    return ret
Copy the code

Download the video code snippet

detail =  get_video_detail_by_id(video_id)
def download_video(detail):
    url = detail.get("play_addr", {}).get("url_list"[]),if len(url) == 0:
        raise TKException("cannot get video url list [%s]" % detail)

    url = url[0]
    folder = DOWNLOAD_DIR + '/' + detail.get('user_info', {}).get("uid"."unknown")
    if not os.path.exists(folder):
        os.mkdir(folder)
    video_id = detail.get('video_info', {}).get('statistics', {}).get('aweme_id')
    # filename = "%s/%s" % (folder, detail.get("video_info", {}).get("desc", video_id) + ".mp4")
    filename = "%s/%s" % (folder, video_id + ".mp4")
    tk_logger.info("download video %s" % url)
    if os.path.isfile(filename):
        file_size = get_remote_file_size(url)
        if file_size == os.path.getsize(filename):
            tk_logger.info("file already downloaded, skip ...")
            return
        else:
            tk_logger.info("download file , file size:%d" % file_size)
    res = requests.get(url, headers=IOS_HEADER)
    if res.status_code == 200:
        with open(filename, "wb") as fp:
            for chunk in res.iter_content(chunk_size=1024):
                fp.write(chunk)
    else:
        raise TKException("download video [%s] failed [%d]" % (url, res.status_code))
Copy the code

Download the video

The statement

This tutorial is for communication purposes only, not for commercial use