preface

The text and pictures in this article come from the network, only for learning, communication, do not have any commercial purposes, if you have any questions, please contact us to deal with.

B station is a well-known video bullet screen website in China, with the most timely new animation,ACG atmosphere and the most creative Up host. The video data in the site is divided into video images and audio data.

Today I’m going to show you how to download and merge the videos from Station B.

Introduction to Python data analysis

https://www.bilibili.com/video/BV1LX4y1u7VA
Copy the code

Environment introduction:

Python 3.6
pycharm
requests
re
json
subprocess

Parse web pages

Target Page Analysis

The video and audio of station B are separated, and the audio URL and video URL are both in ****

Extract the data

1. Extract data by regular matching

2. The re extracts the data as a list, values it through the list, and extracts it

3. String to JSON data

4. Extract the VIDEO URL and audio URL by dictionary value

The crawler code

Import tool

Import requests import re # regular expression import pprint import JSON import subprocessCopy the code

Request header

Headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}Copy the code

The request data

def send_request(url):
    response = requests.get(url=url, headers=headers)
    return response
Copy the code

Parsing video data

Def get_video_data(html_data): """ """ """ "" <span class="tit">(.*?) Json_data = re.findall('<script>window\.__playinfo__=(.*?)) </script>', [0] # print(json_data) # json_data string json_data = json.loads(json_data) pprint Audio_url = json_data [' data '] [' dash '] [' audio '] [0] [' backupUrl] [0] print (' resolve to audio address: ' Video_url = json_data['data']['dash']['video'][0]['backupUrl'][0] print(' backupUrl :', video_url) video_data = [title, audio_url, video_url] return video_dataCopy the code

Save the data

def save_data(file_name, audio_url, video_url): Audio_data = send_request(audio_URL). Content print(' requesting video data ') video_data = send_request(video_url).content with open(file_name + '.mp3', mode='wb') as f: F. rite(audio_data) print(' saving audio data ') with open(file_name + '.mp4', mode='wb') as f: F.rite (video_data) print(' Saving video data ')Copy the code

Data consolidation

def merge_data(video_name): Print (' Start video composition :', video_name) # ffmpeg -i video.mp4 -i audio.wav -c:v copy -c:a aac -strict experimental output.mp4 COMMAND = f'ffmpeg -i {video_name}.mp4 -i {video_name}.mp3 -c:v copy -c:a aac -strict experimental output.mp4' subprocess.Popen(COMMAND, Shell =True) print(' Finish composing :', video_name)Copy the code

rendering

Merge video and audio

The tool used here is < FFMPEG >, a set of open source computer programs for recording, converting, and streaming digital audio and video.

Download and unpack it, but you need to set the environment variables.

1, My computer, right mouse click on properties

2. Select Advanced System Settings

3. Select environment variables

4. Add environment variables, copy the file path, and select Create to add

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python crawls the video from station B. You only need a video address from station B to download it

preface

Environment introduction:

Parse web pages

The crawler code

Merge video and audio

Python crawls the video from station B. You only need a video address from station B to download it

preface

Environment introduction:

Parse web pages

The crawler code

Merge video and audio

Related Posts

Libwxfreq: Tencent open source high performance universal frequency control component

WebRTC Weekly 377

The 2021 harvest and Flag