Hi, I’m Latiao.

Results show

Crawl target

Website: Six rooms

Tool use

Development tools: pycharm development environment: python3.7, Windows10 using toolkit: requests, LXML

Key learning content

  • Dynamic data fetching
  • Requests to use
  • Json data acquisition

Project idea analysis

The first step is to make clear the web page address information that you need to obtain. You should first make clear your collection target. The data collected today is the small video data of six rooms.

The current webpage data can be clearly seen as dynamic data. Find the corresponding data interface and obtain the playing address of the video. If dynamic data is obtained, first open the packet capture tool (the necessary skills of crawler will not be introduced too much), refresh the webpage data, load the page data, and find the dynamic data.

Make a request to the url (headers). Convert the data into a dictionary format, take out the list in the dictionary content, loop out each video data in the list, get the video playing address and video title in the data, and save the corresponding video data.

Easy source sharing

import requests url = 'https://v.6.cn/minivideo/getMiniVideoList.php?act=recommend&page=1&pagesize=25' response = requests.get(url).json() content = response['content']['list'] for i in content: Playurl = I ['playurl'] # playurl = playur.split ('-')[1] # playurl = I ['playurl'] # playurl = playur.split ('-')[1] # playurl = playur.get (playurl).content # F = open('./VIdeo/{}'. Format (title), 'ab') f.write(video) f.close() print('{} ' '.format(title))Copy the code

Finally, I would like to share with you an interesting comparison chart of Tencent, Ali and Bytedance.

From BAT to BAT, only ByteDance is changed from Baidu. Once BTA was the hegemon of PC era, now BAT is the hegemon of mobile Internet.

In the 20th century, in the first decade, Baidu was the king in China by searching, and in the second decade, it went to the world by bytes of algorithm.

In the third decade, who will stand alone at the top, and who will step aside?