Text: so-and-so white rice
Source: Python Technology [official ID: PYTHonAll]

Welcome to the wechat official account: Python Technology, here we have personally written 100 days of practical training, a variety of interesting programming practices, a variety of learning materials, and a large group of lovely friends to discuss with each other.

Download the B site video using Python

Site B, which has 1.72 monthly users, is a video download site for Pythonista. For some unknown reason, videos that have been added to your favorites sometimes fail to work

Analysis of the page

First of all, we open a video in B site (www.bilibili.com/video/BV1Vh… F12 analysis of a wave, in the figure below you can see that there are multiple m4S ending links, and the response type is video/ MP4

Open the panel to the Elements interface and find a javascript variable called window.playinfo. The content is similar to the URL shown in the figure above. It is an M4S link and the target is found

Get the title and link

Capture video page and BeautifulSoup module parse page, get video title and link (www.bilibili.com/video/BV17K…

def __init__(self, bv) :
    # Video page address
    self.url = 'https://www.bilibili.com/video/' + bv
    # Download start time
    self.start_time = time.time()

def get_vedio_info(self) :
    try:
        headers = {
            'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari
        }

        response = requests.get(url = self.url, headers = headers)
        if response.status_code == 200:

            bs = BeautifulSoup(response.text, 'html.parser')
            # Get the video title
            video_title = bs.find('span', class_='tit').get_text()

            # Get the video link
            pattern = re.compile(r"window\.__playinfo__=(.*?) $", re.MULTILINE | re.DOTALL)
            script = bs.find("script", text=pattern)
            result = pattern.search(script.next).group(1)

            temp = json.loads(result)
            # Take the first video link
            for item in temp['data'] ['dash'] ['video'] :if 'baseUrl' in item.keys():
                    video_url = item['baseUrl']
                    break

            return {
                'title': video_title,
                'url': video_url
            }
    except requests.RequestException:
        print('Video link error, please replace it')
Copy the code

Example results:

{
    'title': '" Jay Chou's Love Song 2.0 "quietly recall the 20 years with Jay's company'.'url': 'http://cn-jszj-dx-v-06.bilivideo.com/upgcxcode/34/57/214635734/214635734_nb2-1-30080.m4s?expires=1595538100&platform=pc &ssig=Q5uom_rGdPasJhHBvna8tw&oi=3027480765&trid=347f5dc41e9647e2a6dce48286d0b478u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdn Id = 2725 & mid = 0 & cip = 222.186.35.71 & orderid = 0, 3 & logo = 80000000 '
}

Copy the code

Download the video

Download the video using the URllib module’s urlRetrieve (URL, filename=None, reporthook=None) method, which can download remote data directly to the local

def download_video(self, video) :
    title = re.sub(r'[\/:*?"<>|]'.The '-', video['title'])
    url = video['url']
    filename = title + '.mp4'
    opener = urllib.request.build_opener()
    opener.addheaders = [('Origin'.'https://www.bilibili.com'),
                            ('Referer', self.url),
                            ('User-Agent'.'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari)]
    urllib.request.install_opener(opener)
    urllib.request.urlretrieve(url = url, filename = filename)
Copy the code

Example results:

A video download is complete

The progress bar

There’s still a progress bar missing, and a download without a progress bar is a soulless download

def schedule(self, blocknum, blocksize, totalsize) :
    Callback function of urllib. urlRetrieve :param blocknum: downloaded data block :param blocksize: size of data block :param totalsize: size of remote file :return:"
    percent = 100.0 * blocknum * blocksize / totalsize
    if percent > 100:
        percent = 100
    s = (The '#' * round(percent)).ljust(100.The '-')
    sys.stdout.write("%.2f%%" % percent + '[' + s +'] ' + '\r')
    sys.stdout.flush()
Copy the code

The sample results

Finally update the download video code to add the reporthook parameter

urllib.request.urlretrieve(url = url, filename = filename, reporthook = self.schedule)
Copy the code

conclusion

A simple B station video download tool to complete this, interested in the words of everyone can try to download B station panplay, it seems that the ordinary video is not the same