Article | so-and-so rice \
Source: Python technology “ID: PYTHonAll”
Station B is well known, especially for its dance area. There are 100W+ dance videos. You can’t watch these videos without wifi. As a Python programmer, I write a Python script to download all the videos in the dance area at night. \
Grab a list
First of all, click the dance area and select the house dance list.
Then open the F12 control panel, You can find one https://api.bilibili.com/x/web-interface/newlist?rid=20&type=0&pn=1&ps=20&jsonp=jsonp&callback=jsonCallback_bili_5790571 The URL of 5749828263, where RID is the subcategory of B station and PN is the number of pages.
I tried to open the address in the browser and found a 404, but the return value of the address in the control panel is clearly the list of videos. Try to get rid of the arguments to the callback, and unexpectedly get the desired result.
It is known that the BID is the unique ID of a B site video. To obtain the BID, extract the AID from the return value of the url above, and then convert the AID to the BID.
# aid transfer bid from://zhuanlan.zhihu.com/p/117358823
Str = 'fZodR9XQDSUm21yCkr6zBqiveYah8bt4xsWpHnJE7jL5VG3guMTKNPAwcF'Dict = {} # Put every character in the string into a dictionary, such as f0Z corresponding1One analogy.for i in range(58):
Dict[Str[i]] = i
s = [11.10.3.8.4.6.2.9.5.7] # List of necessary decrypts xOR =177451812
add = 100618342136696320# Def algorithm_enc(av): ret = AV AV =int(av) av = (av ^ xor) + add # to BV number format (BV +10R = list(r = list('BV ')
for i in range(10):
r[s[i]] = Str[av // 58 ** i % 58]
return ' '.join(r)
def find_bid(p):
bids = []
r = requests.get(
'https://api.bilibili.com/x/web-interface/newlist?&rid=20&type=0&pn={}&ps=50&jsonp=jsonp'.format(p))
data = json.loads(r.text)
archives = data['data'] ['archives']
for item in archives:
aid = item['aid']
bid = algorithm_enc(aid)
bids.append(bid)
return bids
Copy the code
Gets the CID of the video
To download a 1080 video, the bid is not enough. You also need the SESSDATA value and CID in the Cookie after login.
First, log in to site B to copy the SESSDATA in the Cookie to the object header. With the address for https://api.bilibili.com/x/player/pagelist?bvid= url returns the cid.
def get_cid(bid):
url = 'https://api.bilibili.com/x/player/pagelist?bvid=' + bid
headers = {
'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212.'Cookie': 'SESSDATA=182cd036%2C1636985829%2C3b393%2A51'.'Host': 'api.bilibili.com'
}
html = requests.get(url, headers=headers).json()
infos = []
data = html['data']
cid_list = data
for item in cid_list:
cid = item['cid']
title = item['part']
infos.append({'bid': bid, 'cid': cid, 'title': title})
return infos
Copy the code
Download the video
Download video from https://api.bilibili.com/x/player/playurl video broadcast after the recommended list every time.
Finally using urllib. Request. Download video urlretrieve function.
def get_video_list(aid, cid, quality):
url_api = 'https://api.bilibili.com/x/player/playurl?cid={}&bvid={}&qn={}'.format(cid, aid, quality)
headers = {
'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212.'Cookie': 'SESSDATA=182cd036%2C1636985829%2C3b393%2A51'.'Host': 'api.bilibili.com'
}
html = requests.get(url_api, headers=headers).json()
video_list = []
for i in html['data'] ['durl']:
video_list.append(i['url'])
return video_list
def schedule_cmd(blocknum, blocksize, totalsize):
percent = 100.0 * blocknum * blocksize/ totalsize
s = (The '#' * round(percent)).ljust(100.The '-')
sys.stdout.write('%.2f%%' % percent + '[' + s + '] ' + '\r')
sys.stdout.flush()
def download(video_list, title, bid):
for i in video_list:
opener = urllib.request.build_opener()
opener.addheaders = [
('User-Agent'.'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212),
('Accept'.'* / *'),
('Accept-Language'.'en-US,en; Q = 0.5 '),
('Accept-Encoding'.'gzip, deflate, br'),
('Range'.'bytes=0-'),
('Referer'.'https://www.bilibili.com/video/'+bid),
('Origin'.'https://www.bilibili.com'),
('Connection'.'keep-alive'),
]
filename=os.path.join('D:\\video', r'{}_{}.mp4'.format(bid,title))
try:
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url=i, filename=filename, reporthook=schedule_cmd)
except:
print(bid + "Download exception, file:" + filename)
Copy the code
conclusion
This article introduces the crawler way to download the B station video, the code amount is not much about 130 lines. You can also expand the download automatically follow up the main video.
reference
- Turn the BID zhuanlan.zhihu.com/p/117358823 [1] AID
PS: Reply “Python” within the public number, you can enter the Python novice learning exchange group, together with 100 days plan!
Old rules, brothers still remember, the lower right corner of the “see” point, if you feel the content of the article is good, remember to share circle of friends to let more people know!
【 Code access method ****】
Identify qr code at the end of the text, reply: 210526