M3u8 is a file video format based on HTTP Live Streaming, which mainly stores the basic information and segments of the whole video.

You’ve all seen the m3U8 format file, so let’s compare the differences and then teach you how to use Python multi-process download and merge.

First, the two are different

Unencrypted M3U8 files

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/45d6271f2d524cbc8aa278da85562bab)

Encrypted M3U8 files

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/b0b2cb0e595c4fd5acb0163820307f86)

The difference is the # ext-x-key message in line 5 of the encrypted file
This information is used to decrypt the content of the video. In fact, the content is mostly a string, which is actually the KEY value at the time of decryption
So how do we decipher this, let’s leave it at that, but let’s explain what each line means
The first line: #EXTM3U declares that this is an M3U8 file
Line 2: # ext-x-version Protocol VERSION number
# ext-x-media-sequence Each MEDIA URI has a unique SEQUENCE number in the PlayList, with the SEQUENCE number +1 between adjacent urIs
Line 4: # ext-x-key records the encryption method, usually AES-128, and the encrypted KEY information
The fifth line: #EXTINF indicates how long the video fragment lasts
The sixth line: sa3lra6g. ts video clip name, when obtaining the need to splicing the domain name, find the correct path of the file

Two, crawler source code

#!/usr/bin/env python
# encoding: utf-8
'''
#-------------------------------------------------------------------
#                   CONFIDENTIAL --- CUSTOM STUDIOS
#-------------------------------------------------------------------
#
#                   @Project Name : 多进程M3U8视频下载助手
#
#                   @File Name    : main.py
#
#                   @Programmer   : Felix
#
#                   @Start Date   : 2020/7/30 14:42
#
#                   @Last Update  : 2020/7/30 14:42
#
#-------------------------------------------------------------------
'''
import requests, os, platform, time
from Crypto.Cipher import AES
import multiprocessing
from retrying import retry

class M3u8:
    '''
     This is a main Class, the file contains all documents.
     One document contains paragraphs that have several sentences
     It loads the original file and converts the original file to new content
     Then the new content will be saved by this class
    '''
    def __init__(self):
        '''
        Initial the custom file by self
        '''
        self.encrypt = False
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"
        }

    def hello(self):
        '''
        This is a welcome speech
        :return: self
        '''
        print("*" * 50)
        print(' ' * 15 + 'm3u8链接下载小助手')
        print(' ' * 5 + '作者: Felix  Date: 2020-05-20 13:14')
        print(' ' * 10 + '适用于非加密 | 加密链接')
        print("*" * 50)
        return self

    def checkUrl(self, url):
        '''
        Determine if it is a available link of m3u8
        :return: bool
        '''
        if '.m3u8' not in url:
            return False
        elif not url.startswith('http'):
            return False
        else:
            return True

    def parse(self, url):
        '''
        Analyze a link of m3u8
        :param url: string, the link need to analyze
        :return: list
        '''
        container = list()
        response = self.request(url).text.split('\n')
        for ts in response:
            if '.ts' in ts:
                container.append(ts)
            if '#EXT-X-KEY:' in ts:
                self.encrypt = True
        return container

    def getEncryptKey(self, url):
        '''
        Access to the secret key
        :param url: string, Access to the secret key by the url
        :return: string
        '''
        encryptKey = self.request("{}/key.key".format(url)).content
        return encryptKey

    def aesDecode(self, data, key):
        '''
        Decode the data
        :param data: stream, the data need to decode
        :param key: secret key
        :return: decode the data
        '''
        crypt = AES.new(key, AES.MODE_CBC, key)
        plain_text = crypt.decrypt(data)
        return plain_text.rstrip(b'\0')

    def download(self, queue, sort, file, downPath, url):
        '''
        Download the debris of video
        :param queue: the queue
        :param sort: which number debris
        :param file: the link of debris
        :param downPath: the path to save debris
        :param url: the link of m3u8
        :return: None
        '''
        queue.put(file)

        baseUrl = '/'.join(url.split("/")[:-1])

        if self.encrypt:
            self.encryptKey = self.getEncryptKey(baseUrl)

        if not file.startswith("http"):
            file = baseUrl + '/' +file

        debrisName = "{}/{}.ts".format(downPath, sort)

        if not os.path.exists(debrisName):
            response = self.request(file)
            with open(debrisName, "wb") as f:
                if self.encrypt:
                    data = self.aesDecode(response.content, self.encryptKey)
                    f.write(data)
                    f.flush()
                else:
                    f.write(response.content)
                    f.flush()

    def progressBar(self, queue, count):
        '''
        Show progress bar
        :param queue: the queue
        :param count: the number count of debris
        :return: None
        '''
        print('---一共{}个碎片...'.format(count))
        offset = 0
        while True:
            offset += 1
            file = queue.get()
            rate = offset * 100 / count
            print("\r%s下载成功，当前进度%0.2f%%, 第%s/%s个" % (file, rate, offset, count))
            if offset >= count:
                break

    @retry(stop_max_attempt_number=3)
    def request(self, url, params):
        '''
        Send a request
        :param url: the url of request
        :param params: the params of request
        :return: the result of request
        '''
        response = requests.get(url, params=params, headers=self.headers, timeout=10)
        assert response.status_code == 200
        return response

    def run(self):
        '''
        program entry, Input basic information
        '''
        downPath = str(input("碎片的保存路径, 默认./Download：")) or "./Download"
        savePath = str(input("视频的保存路径, 默认./Complete：")) or "./Complete"
        clearDebris = bool(input("是否清除碎片, 默认True：")) or True
        saveSuffix = str(input("视频格式, 默认ts：")) or "ts"

        while True:
            url = str(input("请输入合法的m3u8链接："))
            if self.checkUrl(url):
                break

        # create a not available folder
        if not os.path.exists(downPath):
            os.mkdir(downPath)

        if not os.path.exists(savePath):
            os.mkdir(savePath)

        # start analyze a link of m3u8
        print('---正在分析链接...')
        container = self.parse(url)
        print('---链接分析成功...')

        # run processing to do something
        print('---进程开始运行...')
        po = multiprocessing.Pool(30)
        queue = multiprocessing.Manager().Queue()
        size = 0
        for file in container:
            sort = str(size).zfill(5)
            po.apply_async(self.download, args=(queue, sort, file, downPath, url,))
            size += 1

        po.close()
        self.progressBar(queue, len(container))
        print('---进程运行结束...')

        # handler debris
        sys = platform.system()
        saveName = time.strftime("%Y%m%d_%H%M%S", time.localtime())

        print('---文件合并清除...')
        if sys == "Windows":
            os.system("copy /b {}/*.ts {}/{}.{}".format(downPath, savePath, saveName, saveSuffix))
            if clearDebris:
                os.system("rmdir /s/q {}".format(downPath))
        else:
            os.system("cat {}/*.ts>{}/{}.{}".format(downPath, savePath, saveName, saveSuffix))
            if clearDebris:
                os.system("rm -rf {}".format(downPath))
        print('---合并清除完成...')
        print('---任务下载完成...')
        print('---欢迎再次使用...')

if __name__ == "__main__":
    M3u8().hello().run()
Copy the code

Detailed explanation of crawler content

Initialize the M3U8 download class

if name == “main“: M3u8().hello().run()
Hello method

def hello(self): ”’ This is a welcome speech :return: Self “print(“” * 50) print(” * 15 + ‘m3u8 ‘) print(” * 5 + ‘m3U8 ‘) 11 ‘2020-05-20) print (‘ * 10 +’ applies to non encrypted encrypted link | ‘) print (” * “50) return the self
The Hello method is basically a greeting that introduces some basic information
If you do a chain call, you have to return self, so be careful for beginners
Run method

def run(self): DownPath = STR (” Save path for fragments, default./Download: /Download” savePath = STR (input(” savePath for video, default./Complete: /Complete” clearDebris = bool(INPUT (” whether to clean debris, default True: “)) or True saveSuffix = STR (” video format, default ts: “) or “ts”
```
While True: url = STR (input(" please enter valid m3u8 link: ")) if self.checkurl (url): break # create a not available folder if not os.path.exists(downPath): os.mkdir(downPath) if not os.path.exists(savePath): os.mkdir(savePath)Copy the code
```
It is to indicate some path to save the fragment and whether it needs to be cleaned after the merge
The default video format is TS, because GENERAL video software can be opened ts, if you do not trust you can enter MP4
Valid connections this calls a method called checkUrl that checks for valid M3U8 links
Then create some folders that don’t exist

def checkUrl(self, url): ”’ Determine if it is a available link of m3u8 :return: bool ”’ if ‘.m3u8’ not in url: return False elif not url.startswith(‘http’): return False else: return True
Here I simply determine whether the link is M3U8
The first link should be m3u8 ending
Second, links need to start with HTTP
Analyze input links -[Key]

Start analyze a link of m3u8print(‘… ‘)container = self.parse(URL)print(‘– ‘)

def parse(self, url): ”’ Analyze a link of m3u8 :param url: string, the link need to analyze :return: list ”’ container = list() response = self.request(url).text.split(‘\n’) for ts in response: if ‘.ts’ in ts: container.append(ts) if ‘#EXT-X-KEY:’ in ts: self.encrypt = True return container
Request the link and determine if it is encrypted m3U8 or unencrypted
Return all shard files
Enable multi-process and process pool to speed up download

run processing to do something

Print (‘– The process starts running… ‘) po = multiprocessing.Pool(30) queue = multiprocessing.Manager().Queue() size = 0 for file in container: sort = str(size).zfill(5) po.apply_async(self.download, args=(queue, sort, file, downPath, url,)) size += 1

po.close()
The zfill method, in fact, is to fill 0 in front of the number, because I want the downloaded files to be 00001.ts, 00002.ts so that the final merge will not be messy
Queue is a way for multiple processes to share variables and display the downloaded progress bar

The download method

def download(self, queue, sort, file, downPath, url): ”’ Download the debris of video :param queue: the queue :param sort: which number debris :param file: the link of debris :param downPath: the path to save debris :param url: the link of m3u8 :return: None ”’ queue.put(file)

baseUrl = '/'.join(url.split("/")[:-1])

if self.encrypt:
    self.encryptKey = self.getEncryptKey(baseUrl)

if not file.startswith("http"):
file = baseUrl + '/' +file

debrisName = "{}/{}.ts".format(downPath, sort)

if not os.path.exists(debrisName):
    response = self.request(file)
    with open(debrisName, "wb") as f:
        if self.encrypt:
            data = self.aesDecode(response.content, self.encryptKey)
            f.write(data)
            f.flush()
        else:
            f.write(response.content)
            f.flush()
Copy the code

Queuing in the first place is to prevent the length of the file from being incorrect if it already exists
If m3U8 is encrypted, getEncryptKey is used to obtain the KEY value
If the file is encrypted, the file will be decrypted by aesDecode, see the source code
Progress bar display

def progressBar(self, queue, count): ”’ Show progress bar :param queue: the queue :param count: The number count of debris: Return: None print(‘– total {} debris… ‘.format(count)) offset = 0 while True: Offset += 1 file = queue.get() rate = offset * 100 / count print(“\r%s “% (file, rate, offset, count)) if offset >= count: break
It’s basically comparing the number of shards downloaded to the total number of shards
Once greater than or equal to the total, it exits the loop
Download successfully wipe your ass [file merge, debris removal]

handler debris

sys = platform.system() saveName = time.strftime(“%Y%m%d_%H%M%S”, time.localtime())

Print (‘… ‘) if sys == “Windows”: os.system(“copy /b {}/.ts {}/{}.{}”.format(downPath, savePath, saveName, saveSuffix)) if clearDebris: os.system(“rmdir /s/q {}”.format(downPath)) else: os.system(“cat {}/.ts>{}/{}.{}”.format(downPath, savePath, saveName, saveSuffix)) if clearDebris: OS. The system (rm – rf {}. The format (downPath) print (‘ – merger clearance to complete… ‘) print(‘– Task download completed… ‘) print(‘– Welcome to use again… ‘)
This is compatible with Windows and Linux merge clear commands
Clear or not. You can set this parameter in the initial selection

This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian delete!

Original address: blog.csdn.net/weixin_4163…

Get the full project code here

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Download M3U8 encrypted or unencrypted videos using Python Multiprocess!

First, the two are different

Two, crawler source code

Detailed explanation of crawler content

Start analyze a link of m3u8print(‘… ‘)container = self.parse(URL)print(‘– ‘)

run processing to do something

handler debris

Download M3U8 encrypted or unencrypted videos using Python Multiprocess!

First, the two are different

Two, crawler source code

Detailed explanation of crawler content

Start analyze a link of m3u8print(‘… ‘)container = self.parse(URL)print(‘– ‘)

run processing to do something

handler debris

Related Posts

Independent development challenge: Make a product and achieve a monthly income of 30,000 yuan within 6 months

4. Interface Front End Designer guide – Software Project Role Guide series of articles

Python zip functions – Python zero Basics tutorial