preface

The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.

PS: If you need Python learning materials, please click on the link below to obtain them

Free Python learning materials and group communication solutions click to join

Python Development Environment

  • Python 3.6

  • pycharm

    import requests

    pip install requests

    12

Target Page Analysis

! [](https://p6-tt-ipv6.byteimg.com/origin/pgc-image/494b89b87d9e4ec08802cf9830ce02c8)

All are beautiful little sister, love, love ~

! [](https://p1.pstatp.com/origin/pgc-image/e4f4be323ceb470182335f18592d9f7b)

Want to pack up all these little sister selfie videos to take home ~

The site is dynamically loaded with data, and you can find the relevant data package in the redeveloper tool

! [](https://p26-tt.byteimg.com/origin/pgc-image/65aee7c3387d40c5a2c9062c977f883d)

There is a nickname, a title, a cover and a video address, and the copied video address is automatically downloaded, so you only need to simulate this request to get the corresponding data

import requests import pprint url = 'https://v.6.cn/minivideo/getMiniVideoList.php?act=recommend&page=1&pagesize=30' Headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) html_data = response.json() pprint.pprint(html_data) 1234567Copy the code
! [](https://p26-tt.byteimg.com/origin/pgc-image/c8da08674f0549058d9b230ccd282b95)

Json data is returned, so we can get the video address ~ according to the dictionary value method

Pprint formats the input module so that the returned data can be seen clearly.

Parse the data to get the video address and title

lis = html_data['content']['list']  
for li in lis:                      
    title = li['title']             
    play_url = li['playurl']        
    print(title, play_url)          
12345
Copy the code
! [](https://p6-tt-ipv6.byteimg.com/origin/pgc-image/8b582a8185874ac8a7781991776eb06c)

After obtaining the video address, you can request the video address for saving

Response_2 = requests. Get (url=play_url) path = 'D:\ python\\demo\\ '+ title + '.mp4' with open(path, mode='wb') as f: f.write(response_2.content) print(title) 12345Copy the code
! [](https://p9-tt-ipv6.byteimg.com/origin/pgc-image/58e473ae1cd943b6aa71eb65afe20600)

The saved video can be played, but this is only a page of data, for this lost video, xiaobian is certainly not satisfied ~

How to implement the page crawl, which is to analyze the url changes of the data interface

The data loading of liujianfang website is the obsolescence stream data loading method, which is different from the obsolescence that we usually see where just clicking on the next page jumps, it just needs you to slide down, and then gives you data

! [](https://p26-tt.byteimg.com/origin/pgc-image/16bff3e443c74fe6a1f4766dbadd9090)

You can clearly see that the change in page corresponds to the page number

So we just need to loop through it in front of the URL to achieve the effect of turning pages

The complete code

import requests import pprint for page in range(1, 11): url = 'https://v.6.cn/minivideo/getMiniVideoList.php?act=recommend&page={}&pagesize=30'.format(page) headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} Response = requests. Get (url=url, headers=headers) html_data = response.json() lis = html_data['content']['list'] for li in lis: title = li['title'] play_url = li['playurl'] response_2 = requests.get(url=play_url) path = 'D: \ \ python \ \ demo \ \ a back video \ \ video \ \' + + '. Mp4 'title with the open (path, mode =' wb ') as f: f.write(response_2.content) print(title)Copy the code
! [](https://p1-tt-ipv6.byteimg.com/origin/pgc-image/bf253c8fdd3c4544bfbbe5b465fcecbd)