Introduction:
If you want to dump all the videos of Douyin VLogger that you are particularly interested in, how do you do it? How to use Python to export all video information for a specific user
Caught analysis
- Chrome Deveploer Tools Chrome developer Tools
In Douyin APP, copy the vLogger home page address, for example: http://v.douyin.com/kGcU4y/, with chrome in PC clock in, and analog phones, choose the iPhone here, and then copy of the home page address, in a browser to access, Page jump to https://www.iesdouyin.com/share/user/110677980134
- Pull down the home page and select Network=> the XHR TAB and see a similar request
:authority: www.iesdouyin.com :method: GET :path: /web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1561112910000&aid=1128&_signature=3Xf-nxAQgGfU O4SKisB.Ld13.o&dytk=061ae6e81229e178146aa674327eba89 :scheme: https accept: application/json accept-encoding: gzip, deflate, br accept-language: zh-CN,zh; Q = 0.9, en. Q = 0.8, ja. Q = 0.7, useful - TW; Q = 0.6, da; Q = 0.5 cookies: tt_webid = 6690145457198417412; _ga = GA1.2.605400954.1557670882; _ba = BA0.2-20181226-5199 - e - GIJXgXk9ajNkyFhmv7Wy; The user-agent _gid = GA1.2.1914501522.1562857517 referer: https://www.iesdouyin.com/share/user/110677980134: Mozilla / 5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, Like Gecko) Version/11.0 Mobile/15A372 Safari/ 604.1x-requested-with: XMLHttpRequestCopy the code
Return data Screenshot
By analyzing the URL of the Ajax request https://www.iesdouyin.com/web/api/v2/aweme/post/?user_id=110677980134&sec_uid=&count=21&max_cursor=1559299764000&aid=112 8 & _signature = 3 xf – nxAQgGfUO4SKisB. Ld13. O&dytk = 061 ae6e81229e178146aa674327eba89 get request parameters mainly include:
field | type | instructions |
---|---|---|
user_id | int | ID of the Douyin account |
count | int | The default value 21 is used for the number of columns returned |
max_cursor | int | The cursor of the request, each with the MAX_CURSOR returned by the previous request |
aid | int | Use the default value 11128 |
_signature | string | Parameter signatures for each request |
dytk | string | One parameter per request |
Method of obtaining parameters:
- User_id is redirected to the URL (
https://www.iesdouyin.com/share/user/110677980134
) to obtain user_id - Dytk includes this parameter in the source code of the home page, the snippet of the source code of the page
(function() {$(function(){
__M.require('douyin_falcon:page/reflow_user/index').init({
uid: "110677980134",
dytk: '061ae6e81229e178146aa674327eba89'}); }); }) ();Copy the code
This parameter is obtained through the re
- _signature is complicated to obtain. Douyin confuses the JS code in the front end, so it is difficult to directly analyze the algorithm process. However, it can execute the signature algorithm code and return the corresponding signature result.
- You can use NodeJS or Selenium WebDriver to execute JS code. Selenium WebDriver is recommended. The EXECUTION environment of NodeJS is different from that of the browser, and the calculated signature result cannot be verified. Selenium WebDriver can invoke the local browser, and the calculated signatures can be the same as those calculated by the browser’s direct access access.
- After formatting the JS code,Click to viewExecute the js method
_bytedAcrawler.sign("110677980134")
Sign the parameters
Code export home video list
def get_user_video_list_by_uid(user_id, cursor=0):
url = 'https://www.iesdouyin.com/web/api/v2/aweme/post/?'
sign, dytk = signature(user_id)
tk_logger.info("sign:%s,dytk:%s" % (sign, dytk))
if sign is None or dytk is None:
tk_logger.log("sign [%s] or dytk [%s] is none" % (sign, dytk))
return None
headers = dict_merge(CHROME_HEADER, {
"Accept": "application/json"."X-Requested-With": "XMLHttpRequest",
})
params = {
"user_id": user_id,
"count": "21"."max_cursor": cursor,
"aid": "1128"."_signature": sign,
"dytk": dytk
}
res = requests.get(url, headers=headers, params=params)
tk_logger.info("request url: %s" % res.url)
content = res.content.decode("utf8")
jsn = json.loads(content)
return jsn
Copy the code
Get the video list information
Get the video message code snippet
def get_video_detail_by_id(video_id):
url = "Https://aweme-hl.snssdk.com/aweme/v1/aweme/detail/?version_code=6.5.0&pass-region=1&pass-route=1&js_sdk_version=1.16.2. 7 & app_name = aweme&vid = 9 d5f078e f64 A1A9-4-81 - c7 - F89CA6A3B1DC & app_version = 6.5.0 & device_id = 34712926793 & channel = % 20 App store & MC c_mnc=46011&aid=1128&screen_width=750&openudid=263bd93f02801d126ca004edccbff8f6e1b19f51&os_api=18&ac=WIFI&os_version=12. 3.1 & device_platform = iphone&build _number = 65014 & device_type = iPhone9, 1 & iid = 74239983401 & idfa = b4f F39B285A - 4-4874-9 d7e - C728A89 2BF6D"
data = {"aweme_id": video_id}
headers = {
"sdk-version": "1"."x-Tt-Token": "00fc1e7950db67b5f43a312e9265cdfee513ea70c36d918c871f3bb553347f3db50ffca143b8722327b345816a75efca071d"."User-Agent": "Aweme 6.5.0 rv: 65014 (the iPhone; IOS 12.3.1; en_CN) Cronet"."Content-Type": "application/x-www-form-urlencoded"."Cookie": "tt_webid=6636348554880222728; __tea_sdk__user_unique_id=6636348554880222728; odin_tt=76d9b82d6e6f2ddfc99719a5b5d44a7d703cf977f0f7bddf8537f93920d57cb9ec33162ee47868b760f6b09e69209bb2f90bad220b75678a f850a0dfa9f056e2; install_id=74239983401; ttreq=1$dab0516952a4157c0c11d4993533c09d6e45fc94; sid_guard=fc1e7950db67b5f43a312e9265cdfee5%7C1559955316%7C5184000%7CWed%2C+07-Aug-2019+00%3A55%3A16+GMT; uid_tt=0afcb06309f632d872799ec0ac3b2c80; sid_tt=fc1e7950db67b5f43a312e9265cdfee5; sessionid=fc1e7950db67b5f43a312e9265cdfee5"."X-Khronos": "1559956401"."X-Gorgon": "8300000000002e40eee38cad71d14037bd1385d18bc973f094f5",
}
ret = {}
res = requests.post(url, data=data, headers=headers)
if res.status_code == 200:
# tk_logger.info("video detail raw:%s" % res.content.decode("utf8"))
jsn = json.loads(res.content)
detail = jsn.get("aweme_detail". {}) video_info = get_video_info(detail) user_info = get_user_info(detail) play_addr = get_play_address(detail) video_cover = get_video_cover(detail) ret["video_info"] = video_info
ret["user_info"] = user_info
ret["play_addr"] = play_addr
ret["video_cover"] = video_cover
else:
raise TKException("get video detail failed [%s][%d]" % (url, res.status_code))
return ret
Copy the code
Download the video code snippet
detail = get_video_detail_by_id(video_id)
def download_video(detail):
url = detail.get("play_addr", {}).get("url_list"[]),if len(url) == 0:
raise TKException("cannot get video url list [%s]" % detail)
url = url[0]
folder = DOWNLOAD_DIR + '/' + detail.get('user_info', {}).get("uid"."unknown")
if not os.path.exists(folder):
os.mkdir(folder)
video_id = detail.get('video_info', {}).get('statistics', {}).get('aweme_id')
# filename = "%s/%s" % (folder, detail.get("video_info", {}).get("desc", video_id) + ".mp4")
filename = "%s/%s" % (folder, video_id + ".mp4")
tk_logger.info("download video %s" % url)
if os.path.isfile(filename):
file_size = get_remote_file_size(url)
if file_size == os.path.getsize(filename):
tk_logger.info("file already downloaded, skip ...")
return
else:
tk_logger.info("download file , file size:%d" % file_size)
res = requests.get(url, headers=IOS_HEADER)
if res.status_code == 200:
with open(filename, "wb") as fp:
for chunk in res.iter_content(chunk_size=1024):
fp.write(chunk)
else:
raise TKException("download video [%s] failed [%d]" % (url, res.status_code))
Copy the code
Download the video
The statement
This tutorial is for communication purposes only, not for commercial use