I. Project Overview
1. Project background
One day, I suddenly wanted to find something to do, thinking of the C language I had always wanted to learn but had not learned, I decided to learn it. But how? If you read a book, it’s too boring. If you sign up for a class, you have no money. Why not go to B station? Sure enough, keyword C language search, appeared a lot of C language lecture video:B stationwww.bilibili.com/It is an amazing place, a treasure trove of everything that can satisfy almost all your needs and visual desires. Whether you want to see it or notAnimation, pantomime, orGames, ghost animals, orTechnology and all kinds of instructional videosYou can find almost anything you can think of at Station B. For the program ape or will become a program ape, B station programming learning resources are endless, but B station does not provide the download function, if you want to save the download in need of time to see, that is a trouble. I also encountered this problem, so research how can achieve one-click download video, finally usePythonThe magic language came true.
2. Configure the environment
This project didn’t require much environment configuration, but ffMPEG (an open source computer program that can record, convert, and stream digital audio and video) and set up environment variables. Ffmpeg is mainly used to combine downloaded video and audio into a complete video.
Download ffmpeg
Clickable download.csdn.net/download/CU… Or go to ffmpeg.org/download.ht… Download and unzip to the directory you want to save.
Setting environment Variables
- Copy the bin path of FFmpeg, for example
xxx\ffmpeg-20190921-ba24b24-win64-shared\bin
- Right click on properties of this computer to access control panel \ system and Security \ System
- Click Advanced System Settings → Enter the system properties pop-up window → Click Environment Variables → enter the Environment variables pop-up window → select Path under system variables → Click Edit click → Enter the Edit Environment variables pop-up window
- Click New → Paste the bin path that you copied before
- Click OK to gradually save and exit
The dynamic operation is as follows:
In addition to FFMPEG, you also need to install the PyInstaller library for program packaging. The following commands can be used to install:
pip install pyinstaller
Copy the code
If the installation fails or the download speed is slow, you can change the source:
pip install pyinstaller -i https://pypi.doubanio.com/simple/
Copy the code
Ii. Project implementation
1. Import required libraries
import json
import os
import re
import shutil
import ssl
import time
import requests
from concurrent.futures import ThreadPoolExecutor
from lxml import etree
Copy the code
The imported libraries include libraries for crawling and parsing web pages, as well as libraries for creating thread pools and doing other things, most of which are native to Python. If there are any uninstalled libraries, you can install them using the PIP install XXX command.
2. Set request parameters
# set request first-class parameters to prevent reverse crawling
headers = {
'Accept': '* / *'.'Accept-Language': 'en-US,en; Q = 0.5 '.'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36'
}
params = {
'from': 'search'.'seid': '9698329271136034665'
}
Copy the code
Set request first-class parameters to reduce the possibility of reverse crawling.
3. Basic processing
def re_video_info(text, pattern) :
Use regular expressions to match video information and convert it to JSON.
match = re.search(pattern, text)
return json.loads(match.group(1))
def create_folder(aid) :
"Create folder"
if not os.path.exists(aid):
os.mkdir(aid)
def remove_move_file(aid) :
Delete and move files
file_list = os.listdir('/')
for file in file_list:
Remove temporary files
if file.endswith('_video.mp4'):
os.remove(file)
pass
elif file.endswith('_audio.mp4'):
os.remove(file)
pass
# Save the final video file
elif file.endswith('.mp4') :if os.path.exists(aid + '/' + file):
os.remove(aid + '/' + file)
shutil.move(file, aid)
Copy the code
There are two main aspects of basic processing to prepare for the official crawl download:
- Extract information using regular expressions
The requested web page is obtained through the Requests library request, which belongs to the text. Useful information about the video to be downloaded is extracted through the regular expression for subsequent processing.
- File processing
Process related files after downloading videos, including deleting generated temporary audio and video separation files and moving the final video files to the specified folder.
4. Download videos
def download_video_batch(referer_url, video_url, audio_url, video_name, index) :
Batch Download series of videos.
Update the request header
headers.update({"Referer": referer_url})
# get file name
short_name = video_name.split('/') [2]
print("%d.\t Video download start: %s" % (index, short_name))
Download and save the video
video_content = requests.get(video_url, headers=headers)
print('%d.\t%s\t video size: ' % (index, short_name),
round(int(video_content.headers.get('content-length'.0)) / 1024 / 1024.2), '\tMB')
received_video = 0
with open('%s_video.mp4' % video_name, 'ab') as output:
headers['Range'] = 'bytes=' + str(received_video) + The '-'
response = requests.get(video_url, headers=headers)
output.write(response.content)
Download and save the audio
audio_content = requests.get(audio_url, headers=headers)
print('%d.\t%s\t % (index, short_name),
round(int(audio_content.headers.get('content-length'.0)) / 1024 / 1024.2), '\tMB')
received_audio = 0
with open('%s_audio.mp4' % video_name, 'ab') as output:
headers['Range'] = 'bytes=' + str(received_audio) + The '-'
response = requests.get(audio_url, headers=headers)
output.write(response.content)
received_audio += len(response.content)
return video_name, index
def download_video_single(referer_url, video_url, audio_url, video_name) :
Single video Download
Update the request header
headers.update({"Referer": referer_url})
print("Video download begins: %s" % video_name)
Download and save the video
video_content = requests.get(video_url, headers=headers)
print('%s\t Video size: ' % video_name, round(int(video_content.headers.get('content-length'.0)) / 1024 / 1024.2), '\tMB')
received_video = 0
with open('%s_video.mp4' % video_name, 'ab') as output:
headers['Range'] = 'bytes=' + str(received_video) + The '-'
response = requests.get(video_url, headers=headers)
output.write(response.content)
Download and save the audio
audio_content = requests.get(audio_url, headers=headers)
print('%s\t Audio size: ' % video_name, round(int(audio_content.headers.get('content-length'.0)) / 1024 / 1024.2), '\tMB')
received_audio = 0
with open('%s_audio.mp4' % video_name, 'ab') as output:
headers['Range'] = 'bytes=' + str(received_audio) + The '-'
response = requests.get(audio_url, headers=headers)
output.write(response.content)
received_audio += len(response.content)
print("End of video download: %s" % video_name)
video_audio_merge_single(video_name)
Copy the code
This part includes the batch download of series of videos and the download of a single video. The general realization principle of the two is similar, but the parameters of the two functions are different, so they are implemented separately. In a concrete implementation, the request header is first updated to request the video link and save the video (without sound), and then the audio link and save the audio, obtaining the corresponding video and audio file sizes in the process.
5. Combine video and audio into a complete video
def video_audio_merge_batch(result) :
Batch Video and Audio merge with FFMPEG
video_name = result.result()[0]
index = result.result()[1]
import subprocess
video_final = video_name.replace('video'.'video_final')
command = 'ffmpeg -i "%s_video.mp4" -i "%s_audio.mp4" -c copy "%s.mp4" -y -loglevel quiet' % (
video_name, video_name, video_final)
subprocess.Popen(command, shell=True)
print("%d.\t End of video download: %s" % (index, video_name.split('/') [2]))
def video_audio_merge_single(video_name) :
Single video audio merge using FFMPEG
print("Video composition begins: %s" % video_name)
import subprocess
command = 'ffmpeg -i "%s_video.mp4" -i "%s_audio.mp4" -c copy "%s.mp4" -y -loglevel quiet' % (
video_name, video_name, video_name)
subprocess.Popen(command, shell=True)
print("End of video composition: %s" % video_name)
Copy the code
The subProgress module is called to generate the subprocess. Popen class is used to execute shell commands. Since FFMPEG has been added to environment variables, shell commands can directly call FFMPEG to merge audio and video.
6.3 Implementation of different download modes
def batch_download() :
Batch download videos using multiple threads
Enter the ID of the video series you want to download
aid = input('please enter to download the video id (example: link with id 91748877 https://www.bilibili.com/video/av91748877?p=1, default is 91748877 \ t')
if aid:
pass
else:
aid = '91748877'
# Prompt to select clarity
quality = input('Please select Clarity (1 for HD, 2 for clear, 3 for smooth), default hd \t')
if quality == '2':
pass
elif quality == '3':
pass
else:
quality = '1'
acc_quality = int(quality) - 1
# SSL module, handle HTTPS request failure problem, generate certificate context
ssl._create_default_https_context = ssl._create_unverified_context
# Get the video theme
url = 'https://www.bilibili.com/video/av{}?p=1'.format(aid)
html = etree.HTML(requests.get(url, params=params, headers=headers).text)
title = html.xpath('//*[@id="viewbox_report"]/h1/span/text()') [0]
print('The video series you are about to download is:', title)
Create a temporary folder
create_folder('video')
create_folder('video_final')
Define a thread pool with size 3
pool = ThreadPoolExecutor(3)
# Get video information through API
res_json = requests.get('https://api.bilibili.com/x/player/pagelist?aid={}'.format(aid)).json()
video_name_list = res_json['data']
print('Download video {} total'.format(len(video_name_list)))
for i, video_content in enumerate(video_name_list):
video_name = ('./video/' + video_content['part']).replace(""."-")
origin_video_url = 'https://www.bilibili.com/video/av{}'.format(aid) + '? p=%d' % (i + 1)
# Request video, get information
res = requests.get(origin_video_url, headers=headers)
Parse out the video details of JSON
video_info_temp = re_video_info(res.text, '__playinfo__=(.*?) )
video_info = {}
# Get video quality
quality = video_info_temp['data'] ['accept_description'][acc_quality]
Get the duration of the video
video_info['duration'] = video_info_temp['data'] ['dash'] ['duration']
# Get the video link
video_url = video_info_temp['data'] ['dash'] ['video'][acc_quality]['baseUrl']
Get the audio link
audio_url = video_info_temp['data'] ['dash'] ['audio'][acc_quality]['baseUrl']
# Count the length of the video
video_time = int(video_info.get('duration'.0))
video_minute = video_time // 60
video_second = video_time % 60
print('{}.\t Current video definition {}, duration {} minutes {} seconds'.format(i + 1, quality, video_minute, video_second))
Add the task to the thread pool and call back to complete the video/audio merge when the task is complete
pool.submit(download_video_batch, origin_video_url, video_url, audio_url, video_name, i + 1).add_done_callback(
video_audio_merge_batch)
pool.shutdown(wait=True)
time.sleep(5)
# Organize video information
if os.path.exists(title):
shutil.rmtree(title)
os.rename('video_final', title)
try:
shutil.rmtree('video')
except:
shutil.rmtree('video')
def multiple_download() :
Download multiple independent videos in bulk
# prompt to enter all aid
aid_str = input(
'Please enter all video ids to download, separated by Spaces \n Example: There are five links https://www.bilibili.com/video/av89592082, https://www.bilibili.com/video/av68716174, https://www.bilibili.com/video/ Av87216317, \ nhttps://www.bilibili.com/video/av83200644 and https://www.bilibili.com/video/av88252843, Enter 89592082 68716174 87216317 83200644 88252843\n Default: 89592082 68716174 87216317 83200644 88252843\t')
if aid_str:
pass
else:
aid_str = '89592082 68716174 87216317 83200644 88252843
if os.path.exists(aid_str):
shutil.rmtree(aid_str)
aids = aid_str.split(' ')
# Prompt to select video quality
quality = input('Please select Clarity (1 for HD, 2 for clear, 3 for smooth), default hd \t')
if quality == '2':
pass
elif quality == '3':
pass
else:
quality = '1'
acc_quality = int(quality) - 1
Create folder
create_folder(aid_str)
Create a thread pool to perform multiple tasks
pool = ThreadPoolExecutor(3)
for aid in aids:
Add the task to the thread pool
pool.submit(single_download, aid, acc_quality)
pool.shutdown(wait=True)
time.sleep(5)
Delete temporary files and move files
remove_move_file(aid_str)
def single_download(aid, acc_quality) :
"Single video implementation download"
# Request video link to get information
origin_video_url = 'https://www.bilibili.com/video/av' + aid
res = requests.get(origin_video_url, headers=headers)
html = etree.HTML(res.text)
title = html.xpath('//*[@id="viewbox_report"]/h1/span/text()') [0]
print('You are currently downloading:', title)
video_info_temp = re_video_info(res.text, '__playinfo__=(.*?) )
video_info = {}
# Get video quality
quality = video_info_temp['data'] ['accept_description'][acc_quality]
Get the duration of the video
video_info['duration'] = video_info_temp['data'] ['dash'] ['duration']
# Get the video link
video_url = video_info_temp['data'] ['dash'] ['video'][acc_quality]['baseUrl']
Get the audio link
audio_url = video_info_temp['data'] ['dash'] ['audio'][acc_quality]['baseUrl']
# Count the length of the video
video_time = int(video_info.get('duration'.0))
video_minute = video_time // 60
video_second = video_time % 60
print('Current video definition {}, duration {} minutes {} seconds'.format(quality, video_minute, video_second))
Call the function to download and save the video
download_video_single(origin_video_url, video_url, audio_url, title)
def single_input() :
"Single file download, get parameters" "
# Get video aid
aid = input('please enter to download the video id (example: link with id 89592082 https://www.bilibili.com/video/av89592082, default is 89592082 \ t')
if aid:
pass
else:
aid = '89592082'
# Prompt to select video quality
quality = input('Please select Clarity (1 for HD, 2 for clear, 3 for smooth), default hd \t')
if quality == '2':
pass
elif quality == '3':
pass
else:
quality = '1'
acc_quality = int(quality) - 1
Call the function to download
single_download(aid, acc_quality)
Copy the code
In the general case, there are three types of download requirements:
- Download a single video
There is only one video and no other videos in the same series as it, as shown belowAt this point, there is no other video except the video in the relevant recommendation in the lower right, and there is only a bullet screen list but no video list in the upper right. In order to reuse the code, the code that prompts users to input requirements when downloading a single video is extracted separately assingle_input()
, the downloaded function serves as anothersingle_download(aid, acc_quality)
Function, in which: via video link such aswww.bilibili.com/video/av895…Parse the web page, get the corresponding string and convert it tojsonThat is as follows:String formatting is available using JSONwww.sojson.com/editor.htmlPerform online conversion. Gets the title of the video, video quality based on input, duration, video link, and audio link, and callsdownload_video_single()
Function to download the video.
- Multiple video downloads
In this case, there is no relationship between multiple videos. In fact, the downloading of multiple videos is to obtain all the AID first and circulate, and call the function of single video download for each video link passing parameter. At the same time, set up a thread pool with a size of 3, which will not have too much demand on resources, but also realize multi-task and improve download efficiency.
- Series of video downloads
At this point, multiple videos belong to the same series, such aswww.bilibili.com/video/av917…Is a course series, as followsObviously, there is a list of videos in the upper right, indicating that there are 65 subvideos, and each video is marked with a P, like the second videowww.bilibili.com/video/av917…. For all videos, the relevant information of the video is first obtained, and then added to the process pool for downloading, and the function is called back after the task is finishedvideo_audio_merge_batch()
Merge audio and video, and file.
7. The main function
def main() :
Main function that prompts the user to select three download modes.
download_choice = input('Please enter the type you want to download: \n1 means to download a single video, 2 means to download a series of videos in batches, 3 means to download multiple different videos in batches, the default is to download a single video \t')
# Batch download series of videos
if download_choice == '2':
batch_download()
# Batch download multiple single videos
elif download_choice == '3':
multiple_download()
# Download a single video
else:
single_input()
if __name__ == '__main__':
Call main function
main()
Copy the code
In the main function, the corresponding functions of the three download methods are respectively called.
Iii. Project analysis and description
1. Results testing
The results of testing the three methods are as follows:
The test results of the three download scenarios are good, and the download speed is comparable to the normal download speed. Code clickableDownload.csdn.net/download/CU…orGithub.com/corleytd/Py…Download.
To improve that
The website of site B is also changing all the time, so there may be some changes for downloading, so the improvements are listed below:
- Url parameter changes
For example: this time found a link to a video series at B station turnedwww.bilibili.com/video/BV1x7…, that is, the irregular string (which may be encoded or encrypted by some algorithm), now the aid of the video (series) cannot be obtained from the link, at this time, you can use the browser tool to capture the package and check the data to find the AID of the video, as follows:On the left, look for the request beginning with STAT, followed by aid. The full link to the request API isApi.bilibili.com/x/web-inter…We can also see that the first data is AID. We can also see that the random string is BVID. It is possible that a one-to-one mapping between Aid and BVID has been established.
2. Software packaging
On the command line, run the path where the code is
pyinstaller bilibili_downloader_1.py
Copy the code
136 INFO: PyInstaller: 3.6 137 INFO: Python: 3.7.4 138 INFO: Platform: Windows-10-10.0.18362-sp0 140 INFO: PyInstaller: 3.6 137 INFO: Python: 3.7.4 138 INFO: Platform: Windows-10-10.0.18362-sp0 140 INFO: wrote xxxx\Bili_Video_Batch_Download\bilibili_downloader_1.spec 205 INFO: UPX is not available. 209 INFO: Extending PYTHONPATH with paths ['xxxx\\Bili_Video_Batch_Download', 'xxxx\\Bili_Video_Batch_Download'] 210 INFO: checking Analysis 211 INFO: Building Analysis because Analysis-00.toc is non existent 211 INFO: Initializing module dependency graph... 218 INFO: Caching module graph hooks... 247 INFO: Analyzing base_library.zip ... 5499 INFO: Caching module dependency graph... 5673 INFO: running Analysis Analysis-00.toc 5702 INFO: Adding Microsoft.Windows.Common-Controls to dependent assemblies of final executable required by xxx\python\python37\python.exe 6231 INFO: Analyzing xxxx\Bili_Video_Batch_Download\bilibili_downloader_1.py 7237 INFO: Processing pre-safe import module hook urllib3.packages.six.moves 10126 INFO: Processing pre-safe import module hook six.moves 14287 INFO: Processing module hooks... 14288 INFO: Loading module hook "hook-certifi.py"... 14296 INFO: Loading module hook "hook-cryptography.py"... 14936 INFO: Loading module hook "hook-encodings.py"... 15093 INFO: Loading module hook "hook-lxml.etree.py"... 15097 INFO: Loading module hook "hook-pydoc.py"... 15099 INFO: Loading module hook "hook-xml.py"... 15330 INFO: Looking for ctypes DLLs 15334 INFO: Analyzing run-time hooks ... 15339 INFO: Including run-time hook 'pyi_rth_multiprocessing.py' 15344 INFO: Including run-time hook 'pyi_rth_certifi.py' 15355 INFO: Looking for dynamic libraries 15736 INFO: Looking for eggs 15737 INFO: Using Python library xxx\python\python37\python37.dll 15757 INFO: Found binding redirects: [] 15776 INFO: Warnings written to xxxx\Bili_Video_Batch_Download\build\bilibili_downloader_1\war n-bilibili_downloader_1.txt 15942 INFO: Graph cross-reference written to xxxx\Bili_Video_Batch_Download\build\bilibili_dow nloader_1\xref-bilibili_downloader_1.html 15967 INFO: checking PYZ 15968 INFO: Building PYZ because PYZ-00.toc is non existent 15968 INFO: Building PYZ (ZlibArchive) xxxx\Bili_Video_Batch_Download\build\bilibili_downloade r_1\PYZ-00.pyz 16944 INFO: Building PYZ (ZlibArchive) xxxx\Bili_Video_Batch_Download\build\bilibili_downloade r_1\PYZ-00.pyz completed successfully. 16980 INFO: checking PKG 16981 INFO: Building PKG because PKG-00.toc is non existent 16981 INFO: Building PKG (CArchive) PKG-00.pkg 17030 INFO: Building PKG (CArchive) PKG-00.pkg completed successfully. 17034 INFO: Bootloader xxx\python\python37\lib\site-packages\PyInstaller\bootloader\Windows-64bit\run.exe 17034 INFO: checking EXE 17035 INFO: Building EXE because EXE-00.toc is non existent 17035 INFO: Building EXE from EXE-00.toc 17037 INFO: Appending archive to EXE xxxx\Bili_Video_Batch_Download\build\bilibili_downloader_ 1\bilibili_downloader_1.exe 17046 INFO: Building EXE from EXE-00.toc completed successfully. 17053 INFO: checking COLLECT 17053 INFO: Building COLLECT because COLLECT-00.toc is non existent 17055 INFO: Building COLLECT COLLECT-00.tocCopy the code
appearINFO: Building EXE from EXE-00.toc completed successfully.The package is successful. Find it in the current pathdistorbuildIn the directorybilibili_downloader_1In the directorybilibili_downloader_1.exe, that is, packaged software. Click “Open” to select and input and start to download the corresponding video. The following is an example test: 在bilibili_downloader_1.exeYou can see the downloaded and saved video in the same directory as.
3. Improve the analysis
This project is the first attempt of xiaobian to download videos from B station, and there are inevitably many shortcomings. In the process of implementation and the summary of the later stage, it can be seen that there are still some problems:
- You can’t download all the videos on THE B website. Currently, it is limited to all kinds of ordinary video tutorials, and you can’t download live videos and big member dramas, etc., which can be further optimized in the later stage.
- The code is too tedious, and there are many duplicate codes with similar functions, which can further simplify and improve the reusability of the code;
- If you do not take proper measures to deal with the anti-crawling of station B, you may not be able to download normally because of too many requests.
It can be optimized at a later stage to make the entire program more robust.
4. Description of legality
- The starting point of this project is easily learning video download site B, can better learning all kinds of tutorial, it is also a kind of welfare for program ape, but not with the other business purpose, all the readers can refer to implement ideas and the program code, but can not be used in malicious or illegal purpose (frequent malicious downloading video, illegal profit, etc.), if there are any violations is responsible for, please.
- This project may refer to the implementation ideas of other leaders in the process of implementation. If it infringes others’ interests, please contact to change or delete it.
- This project is the first of the series of B station video batch download, there are a lot of places to be improved, the later will continue to update, welcome readers to exchange comments, in order to continue to improve.