M3u8 is a file video format based on HTTP Live Streaming, which mainly stores the basic information and segments of the whole video.
You’ve all seen the m3U8 format file, so let’s compare the differences and then teach you how to use Python multi-process download and merge.
First, the two are different
- Unencrypted M3U8 files
- Encrypted M3U8 files
- The difference is the # ext-x-key message in line 5 of the encrypted file
- This information is used to decrypt the content of the video. In fact, the content is mostly a string, which is actually the KEY value at the time of decryption
- So how do we decipher this, let’s leave it at that, but let’s explain what each line means
- The first line: #EXTM3U declares that this is an M3U8 file
- Line 2: # ext-x-version Protocol VERSION number
- # ext-x-media-sequence Each MEDIA URI has a unique SEQUENCE number in the PlayList, with the SEQUENCE number +1 between adjacent urIs
- Line 4: # ext-x-key records the encryption method, usually AES-128, and the encrypted KEY information
- The fifth line: #EXTINF indicates how long the video fragment lasts
- The sixth line: sa3lra6g. ts video clip name, when obtaining the need to splicing the domain name, find the correct path of the file
Two, crawler source code
#!/usr/bin/env python
# encoding: utf-8
'''
#-------------------------------------------------------------------
# CONFIDENTIAL --- CUSTOM STUDIOS
#-------------------------------------------------------------------
#
# @Project Name : 多进程M3U8视频下载助手
#
# @File Name : main.py
#
# @Programmer : Felix
#
# @Start Date : 2020/7/30 14:42
#
# @Last Update : 2020/7/30 14:42
#
#-------------------------------------------------------------------
'''
import requests, os, platform, time
from Crypto.Cipher import AES
import multiprocessing
from retrying import retry
class M3u8:
'''
This is a main Class, the file contains all documents.
One document contains paragraphs that have several sentences
It loads the original file and converts the original file to new content
Then the new content will be saved by this class
'''
def __init__(self):
'''
Initial the custom file by self
'''
self.encrypt = False
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"
}
def hello(self):
'''
This is a welcome speech
:return: self
'''
print("*" * 50)
print(' ' * 15 + 'm3u8链接下载小助手')
print(' ' * 5 + '作者: Felix Date: 2020-05-20 13:14')
print(' ' * 10 + '适用于非加密 | 加密链接')
print("*" * 50)
return self
def checkUrl(self, url):
'''
Determine if it is a available link of m3u8
:return: bool
'''
if '.m3u8' not in url:
return False
elif not url.startswith('http'):
return False
else:
return True
def parse(self, url):
'''
Analyze a link of m3u8
:param url: string, the link need to analyze
:return: list
'''
container = list()
response = self.request(url).text.split('\n')
for ts in response:
if '.ts' in ts:
container.append(ts)
if '#EXT-X-KEY:' in ts:
self.encrypt = True
return container
def getEncryptKey(self, url):
'''
Access to the secret key
:param url: string, Access to the secret key by the url
:return: string
'''
encryptKey = self.request("{}/key.key".format(url)).content
return encryptKey
def aesDecode(self, data, key):
'''
Decode the data
:param data: stream, the data need to decode
:param key: secret key
:return: decode the data
'''
crypt = AES.new(key, AES.MODE_CBC, key)
plain_text = crypt.decrypt(data)
return plain_text.rstrip(b'\0')
def download(self, queue, sort, file, downPath, url):
'''
Download the debris of video
:param queue: the queue
:param sort: which number debris
:param file: the link of debris
:param downPath: the path to save debris
:param url: the link of m3u8
:return: None
'''
queue.put(file)
baseUrl = '/'.join(url.split("/")[:-1])
if self.encrypt:
self.encryptKey = self.getEncryptKey(baseUrl)
if not file.startswith("http"):
file = baseUrl + '/' +file
debrisName = "{}/{}.ts".format(downPath, sort)
if not os.path.exists(debrisName):
response = self.request(file)
with open(debrisName, "wb") as f:
if self.encrypt:
data = self.aesDecode(response.content, self.encryptKey)
f.write(data)
f.flush()
else:
f.write(response.content)
f.flush()
def progressBar(self, queue, count):
'''
Show progress bar
:param queue: the queue
:param count: the number count of debris
:return: None
'''
print('---一共{}个碎片...'.format(count))
offset = 0
while True:
offset += 1
file = queue.get()
rate = offset * 100 / count
print("\r%s下载成功,当前进度%0.2f%%, 第%s/%s个" % (file, rate, offset, count))
if offset >= count:
break
@retry(stop_max_attempt_number=3)
def request(self, url, params):
'''
Send a request
:param url: the url of request
:param params: the params of request
:return: the result of request
'''
response = requests.get(url, params=params, headers=self.headers, timeout=10)
assert response.status_code == 200
return response
def run(self):
'''
program entry, Input basic information
'''
downPath = str(input("碎片的保存路径, 默认./Download:")) or "./Download"
savePath = str(input("视频的保存路径, 默认./Complete:")) or "./Complete"
clearDebris = bool(input("是否清除碎片, 默认True:")) or True
saveSuffix = str(input("视频格式, 默认ts:")) or "ts"
while True:
url = str(input("请输入合法的m3u8链接:"))
if self.checkUrl(url):
break
# create a not available folder
if not os.path.exists(downPath):
os.mkdir(downPath)
if not os.path.exists(savePath):
os.mkdir(savePath)
# start analyze a link of m3u8
print('---正在分析链接...')
container = self.parse(url)
print('---链接分析成功...')
# run processing to do something
print('---进程开始运行...')
po = multiprocessing.Pool(30)
queue = multiprocessing.Manager().Queue()
size = 0
for file in container:
sort = str(size).zfill(5)
po.apply_async(self.download, args=(queue, sort, file, downPath, url,))
size += 1
po.close()
self.progressBar(queue, len(container))
print('---进程运行结束...')
# handler debris
sys = platform.system()
saveName = time.strftime("%Y%m%d_%H%M%S", time.localtime())
print('---文件合并清除...')
if sys == "Windows":
os.system("copy /b {}/*.ts {}/{}.{}".format(downPath, savePath, saveName, saveSuffix))
if clearDebris:
os.system("rmdir /s/q {}".format(downPath))
else:
os.system("cat {}/*.ts>{}/{}.{}".format(downPath, savePath, saveName, saveSuffix))
if clearDebris:
os.system("rm -rf {}".format(downPath))
print('---合并清除完成...')
print('---任务下载完成...')
print('---欢迎再次使用...')
if __name__ == "__main__":
M3u8().hello().run()
Copy the code
Detailed explanation of crawler content
-
Initialize the M3U8 download class
if name == “main“: M3u8().hello().run()
-
Hello method
def hello(self): ”’ This is a welcome speech :return: Self “print(“” * 50) print(” * 15 + ‘m3u8 ‘) print(” * 5 + ‘m3U8 ‘) 11 ‘2020-05-20) print (‘ * 10 +’ applies to non encrypted encrypted link | ‘) print (” * “50) return the self
-
The Hello method is basically a greeting that introduces some basic information
-
If you do a chain call, you have to return self, so be careful for beginners
-
Run method
def run(self): DownPath = STR (” Save path for fragments, default./Download: /Download” savePath = STR (input(” savePath for video, default./Complete: /Complete” clearDebris = bool(INPUT (” whether to clean debris, default True: “)) or True saveSuffix = STR (” video format, default ts: “) or “ts”
While True: url = STR (input(" please enter valid m3u8 link: ")) if self.checkurl (url): break # create a not available folder if not os.path.exists(downPath): os.mkdir(downPath) if not os.path.exists(savePath): os.mkdir(savePath)Copy the code
-
It is to indicate some path to save the fragment and whether it needs to be cleaned after the merge
-
The default video format is TS, because GENERAL video software can be opened ts, if you do not trust you can enter MP4
-
Valid connections this calls a method called checkUrl that checks for valid M3U8 links
-
Then create some folders that don’t exist
def checkUrl(self, url): ”’ Determine if it is a available link of m3u8 :return: bool ”’ if ‘.m3u8’ not in url: return False elif not url.startswith(‘http’): return False else: return True
-
Here I simply determine whether the link is M3U8
-
The first link should be m3u8 ending
-
Second, links need to start with HTTP
-
Analyze input links -[Key]
Start analyze a link of m3u8print(‘… ‘)container = self.parse(URL)print(‘– ‘)
def parse(self, url): ”’ Analyze a link of m3u8 :param url: string, the link need to analyze :return: list ”’ container = list() response = self.request(url).text.split(‘\n’) for ts in response: if ‘.ts’ in ts: container.append(ts) if ‘#EXT-X-KEY:’ in ts: self.encrypt = True return container
-
Request the link and determine if it is encrypted m3U8 or unencrypted
-
Return all shard files
-
Enable multi-process and process pool to speed up download
run processing to do something
Print (‘– The process starts running… ‘) po = multiprocessing.Pool(30) queue = multiprocessing.Manager().Queue() size = 0 for file in container: sort = str(size).zfill(5) po.apply_async(self.download, args=(queue, sort, file, downPath, url,)) size += 1
po.close()
-
The zfill method, in fact, is to fill 0 in front of the number, because I want the downloaded files to be 00001.ts, 00002.ts so that the final merge will not be messy
-
Queue is a way for multiple processes to share variables and display the downloaded progress bar
-
The download method
def download(self, queue, sort, file, downPath, url): ”’ Download the debris of video :param queue: the queue :param sort: which number debris :param file: the link of debris :param downPath: the path to save debris :param url: the link of m3u8 :return: None ”’ queue.put(file)
baseUrl = '/'.join(url.split("/")[:-1]) if self.encrypt: self.encryptKey = self.getEncryptKey(baseUrl) if not file.startswith("http"): file = baseUrl + '/' +file debrisName = "{}/{}.ts".format(downPath, sort) if not os.path.exists(debrisName): response = self.request(file) with open(debrisName, "wb") as f: if self.encrypt: data = self.aesDecode(response.content, self.encryptKey) f.write(data) f.flush() else: f.write(response.content) f.flush() Copy the code
-
Queuing in the first place is to prevent the length of the file from being incorrect if it already exists
-
If m3U8 is encrypted, getEncryptKey is used to obtain the KEY value
-
If the file is encrypted, the file will be decrypted by aesDecode, see the source code
-
Progress bar display
def progressBar(self, queue, count): ”’ Show progress bar :param queue: the queue :param count: The number count of debris: Return: None print(‘– total {} debris… ‘.format(count)) offset = 0 while True: Offset += 1 file = queue.get() rate = offset * 100 / count print(“\r%s “% (file, rate, offset, count)) if offset >= count: break
-
It’s basically comparing the number of shards downloaded to the total number of shards
-
Once greater than or equal to the total, it exits the loop
-
Download successfully wipe your ass [file merge, debris removal]
handler debris
sys = platform.system() saveName = time.strftime(“%Y%m%d_%H%M%S”, time.localtime())
Print (‘… ‘) if sys == “Windows”: os.system(“copy /b {}/.ts {}/{}.{}”.format(downPath, savePath, saveName, saveSuffix)) if clearDebris: os.system(“rmdir /s/q {}”.format(downPath)) else: os.system(“cat {}/.ts>{}/{}.{}”.format(downPath, savePath, saveName, saveSuffix)) if clearDebris: OS. The system (rm – rf {}. The format (downPath) print (‘ – merger clearance to complete… ‘) print(‘– Task download completed… ‘) print(‘– Welcome to use again… ‘)
-
This is compatible with Windows and Linux merge clear commands
-
Clear or not. You can set this parameter in the initial selection
This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian delete!
Original address: blog.csdn.net/weixin_4163…
Get the full project code here