preface
The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.
PS: If you need Python learning materials, please click on the link below to obtain them
Free Python learning materials and group communication solutions click to join
Python Development Environment
-
Python 3.6
-
pycharm
import requests
pip install requests
12
Target Page Analysis
All are beautiful little sister, love, love ~
Want to pack up all these little sister selfie videos to take home ~
The site is dynamically loaded with data, and you can find the relevant data package in the redeveloper tool
There is a nickname, a title, a cover and a video address, and the copied video address is automatically downloaded, so you only need to simulate this request to get the corresponding data
import requests import pprint url = 'https://v.6.cn/minivideo/getMiniVideoList.php?act=recommend&page=1&pagesize=30' Headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) html_data = response.json() pprint.pprint(html_data) 1234567Copy the code
Json data is returned, so we can get the video address ~ according to the dictionary value method
Pprint formats the input module so that the returned data can be seen clearly.
Parse the data to get the video address and title
lis = html_data['content']['list']
for li in lis:
title = li['title']
play_url = li['playurl']
print(title, play_url)
12345
Copy the code
After obtaining the video address, you can request the video address for saving
Response_2 = requests. Get (url=play_url) path = 'D:\ python\\demo\\ '+ title + '.mp4' with open(path, mode='wb') as f: f.write(response_2.content) print(title) 12345Copy the code
The saved video can be played, but this is only a page of data, for this lost video, xiaobian is certainly not satisfied ~
How to implement the page crawl, which is to analyze the url changes of the data interface
The data loading of liujianfang website is the obsolescence stream data loading method, which is different from the obsolescence that we usually see where just clicking on the next page jumps, it just needs you to slide down, and then gives you data
You can clearly see that the change in page corresponds to the page number
So we just need to loop through it in front of the URL to achieve the effect of turning pages
The complete code
import requests import pprint for page in range(1, 11): url = 'https://v.6.cn/minivideo/getMiniVideoList.php?act=recommend&page={}&pagesize=30'.format(page) headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) html_data = response.json() lis = html_data['content']['list'] for li in lis: title = li['title'] play_url = li['playurl'] response_2 = requests.get(url=play_url) path = 'D: \ \ python \ \ demo \ \ a back video \ \ video \ \' + + '. Mp4 'title with the open (path, mode =' wb ') as f: f.write(response_2.content) print(title)Copy the code