Hello, I’m Charlie

Today we share a small tool, mainly used for B station video download, just need to enter the corresponding video web address can be downloaded to the local.

Directory:

    1. Introduction of the principle
    1. Web analytics
    1. Video crawl
    1. In the local
    1. GUI tool Making
    1. The complete code

1. Principle introduction

The principle is very simple, is to obtain the source address of the video resource, and then climb the binary content of the video, and then write to the local.

2. Web page analysis

Open the page, then F12 to enter developer mode, and then click On Network – > All, because video resources are generally large, I sorted them by size from large to small, and found the first one, which may be related to the video source address.

Then, we copy the invariant part of the URL in the found section and search back to the element with CTRL +F to find the node that may be related to the source address of the video.

Sure enough, we copied the content and used the JSON online parsing tool to discover that we actually had the address we needed to look like the video file.

Then I copied the address and opened it in my browser and found 403.

But it doesn’t matter. Let’s see what happens next!

3. Video crawl

In the part of web page analysis, we can obtain the source address of the video file through various data analysis methods in the source code of the video station B address web page. Here, I use regular expression.

import requests import re import json url = 'https://www.bilibili.com/video/BV1BU4y1H7E3' headers = { "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36", "referer": "https://www.bilibili.com" } resp = requests.get(url, headers=headers) palyinfo = re.findall(r'<script>window.__playinfo__=(.*?)</script>', resp.text)[0] palyinfo_data = json.loads(palyinfo)Copy the code

Since the result being retrieved from the expression is a string, which is actually A JSON (dictionary), you need to introduce a JSON library to do the conversion.

After analyzing the data, we can find the information of the final video file. We can directly perform key-value operation. The interesting thing is that the video and audio files are separate, so we need to crawl them separately and then merge them.

Video_url = jSON_data ['data']['dash']['video'][0][' base_URL '] Audio_url = json_data['data']['dash']['audio'][0]['base_url']Copy the code

As some of you may notice, base_url seems to have several. Yes, because there are many kinds of video definition. Here I choose is the first ultra CLEAR 4K, you can choose according to their own needs!

Of course, when we save the video locally, we need to give it a name, so we just need to find any node here and parse out the file name.

Re.findall (r'<h1 title="(.*?) " class="video-title">', resp.text)[0]Copy the code

4. Save the file to the local PC

Now that we have resolved and obtained the file address, audio address and file name of the video, let’s directly arrange the download!

However, when analyzing the web page, we found that directly opening the address of the video and audio files will prompt 403, so because the source of the skip is not clear, we just need to adjust the request header as follows:

Headers = {" user-agent ": "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36", "https://www.bilibili.com" }Copy the code

With that done, let’s start writing files to local functions.

Def down_file(file_url, file_type): headers = {" user-agent ": "Mozilla/5.0 (Windows NT 10.0); Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36", "referer": "https://www.bilibili.com"} resp = requests. Get (url = file_url, headers=headers) print(resp.status_code) print(f' File_size = int(resp.headers['content-length']) # record the size of the file that has been downloaded File_size_MB = file_size / 1024/1024 print(f' file size: {file_size_MB:0.2f} MB') start_time = time.time() with open(title + '.' + file_type, mode='wb') as f: for chunk in resp.iter_content(chunk_size=chunk_size): F.rite (chunk) done_size += len(chunk) print(f'\r download schedule: {done_size/file_size*100:0.2f}%',end= ") end_time = time.time() cost_time = end_time-start_time print(f'\n {cost_time:0.2f} SEC ') print(f' Download speed: {file_size_MB/cost_time:0.2f}M/s')Copy the code

Running results:

Down_file (video_url, 'mp4') 200 下载 : down_file(video_url, 'mp4') 200 下载 : down_file(video_url, 'mp4') 200 下载 : Down_file (audio_url, 'mp3') 200 下载 : >>> _file(audio_url, 'mp3') 200 下载 : >>> _file(audio_url, 'mp3') 6.42 M/sCopy the code

We can see the downloaded video file locally:

Since the video and audio are separated, there is no sound in opening this video alone, so we need to merge the operation.

The moviepy library is used for merging operations, and we will introduce more applications of this library in the future, so stay tuned

From moviepy import * from moviepy.editor import * video_path = title + '.mp4' Audio_path = title + '.mp3 Video = video.set_audio(audio) # fileclip (video_path Video.write_videofile (f"{title}.mp4")Copy the code

And that’s it:

Moviepy - Building Video 【 影 片 名 称 】第20集 5 Moviepy - Building Video 【 影 片 名 称 】第20集 MoviePy - Writing video 【 影 片 名 称 】第20集 5 影 片 名 称 : Temp_mpy_wvf_snd.mp3 影 片 名 称 : MoviePy - Writing video 【 影 片 名 称 】 影 片 名 称 : Temp_mpy_wvf_snd.mp3 影 片 名 称 : MoviePy - Writing video 【 影 片 名 称 】 影 片 名 称 : Temp_MPy_wvf_snd.mp3 影 片 名 称 : MoviePy - Writing video 【 影 片 名 称 】 影 片 名 称 : Temp_MPy_wvf_snd.mp4 Moviepy - Done ! Moviepy - Video Ready 【 影 片 名 称 】第20 年 5 月 解 handsome Is a Bit too Much. Mp4Copy the code

5. Make GUI tools

This bar, is to use my commonly used PySimpleGUI to operate, relatively simple.

Import PySimpleGUI as sg # theme set sg.theme('SystemDefaultForReal') # Layout set layout = [[sg.Text(' select B from the video address :',font=(" Microsoft Yahei ", 12)), sg. InputText (key = "url", size = (50, 1), the font = (" Microsoft black ", 10), enable_events = True)], # [sg. The Output (size = (66, 8), the font = (" Microsoft black ", 10))], [sg. The Button (' start download ', the font = (" Microsoft black ", 10), button_color = 'Orange'), sg. The Button (' shut down the program, the font = (" Microsoft black ", > < span style = "box-sizing: border-box; color: RGB (51, 51, 51); font-size: 14px! Important; white-space: normal;" While True: event, values = window.read() if event in (None, 'close program '): Break if event == 'start downloading ': Url = values['url'] print(' get video information ') title, video_url, audio_url = get_file_info(url) print(' download video resources ') down_file(title, Video_url, 'mp4') print(' download audio resource ') down_file(title, audio_url, 'mp3') print(' merge ') print(' finish ') window.close()Copy the code

6. Complete code

Not elegant enough, we can optimize ha!