Python crawler: Learn JS encryption to crawl netease Cloud music

preface

Hi, everyone, I am IL_persistent _LI. The reason for this name is that I hope that whatever I do, as long as I choose to do it at the beginning, I will stick to it no matter what the result is. Next, I will talk about today’s topic, using Python crawler to climb netease cloud music. Before, XIAobi tried to climb QQ music, kugou music, kuwo music, but I think it is the most difficult one to climb netease cloud music. Why do you say that? Other than the fact that it’s a POST request, that’s the encryption of it. Originally xiaobian had planned to try to climb it, but suffered from the browser breakpoint operation has not known how to do, now know, and successfully climb netease cloud music. Xiaobian here to remind readers, civilization reptile

That is: 1. Do not run this program during the peak use period of the website users, so as not to affect the normal operation of the website;

2. This program code is only for learning, and not used in commercial activities, once found by relevant personnel, this small edition is not responsible! I hope readers remember!

1. Understand the encryption of netease cloud music

Xiaobian through several attempts, personally feel netease cloud music encryption principle is like this. The URL we need to crawl is POST, and the POST request requires request parameters. Netease Cloud Music encrypts the request parameters first, and then initiates the request (to prevent reverse crawling), so that the request data looks like a long string of strings that readers cannot understand.Since it encrypts the request, we can also simulate the encryption operation! However, you need to know the initial request parameters and encryption algorithm, if you know both, then in fact, the url is not as difficult as you think.

2. Find the initial parameters for sending a request

So how do you find the initial request parameters? Here’s an example of how to start a search for a songYou can find that this is a dynamic url, directly according to the url can not get these songs and song ID ha! Press F12 on your keyboard or right click to go to developer mode in your browser.If you go to the above screen, you can find the url is:Music.163.com/weapi/cloud…Below this url are the names and ids of these songs. Since this is a POST request and the request parameters are encrypted, how do I get the initial request parameters?We can do the above operation, first break, and then refresh to see if the Scope under the url mentioned above. If it does not appear, keep clicking on the symbol on the small series image until the url appears.As you can see, the image now has the url above, but the request parameter is encrypted. After that, click the content below the Call Stack until the request parameter is unencrypted, which is the original unencrypted request parameter.Let’s make a request with the two encrypted parameters indicated by the small editor on the diagram.{” hlPREtag “: “<span class=”s-fc7″>”, “hlPostTag “: “”, “s”: song name, “type”: “1”, “offset”: “0”, “total”: “true”, “limit”: “30”, “csrf_token”: “”}

3. Understand encryption algorithms

So what is its encryption algorithm? If we search for params in the JS code, we can see that there are two key values for the request parameter Both the value of the respectively bWv0x encText, bWv0x. EncSecKey, And bWv0x = window. Asrsea (JSON. Stringify (i0x), bsK6E ([” tears “, “strong”]), bsK6E (XR7K. Md), bsK6E ([” love “, “girl,” “panic”, “laugh”])); This window.asrsea() is a function with four arguments, and bsK6E() is a function with fixed arguments. BsK6E () returns the value of the key in the input string array, for example, bsK6E([” tears “, “strong “])=”010001”

If we CTRL +F to find the definition of asrsea, we can find it here

By analyzing the above a, B, C, D functions, it can be found that a function is actually a random length of the string, B function small make up a little do not understand, guess should be encryption operation! EncSecKey (encSecKey, encSecKey, encSecKey, encSecKey, encSecKey, encSecKey, encSecKey, encSecKey, encSecKey) I is randomly generated, so how can I make it fixed? You just take one of its return values and make it represent everything. After looking at b function, xiaobian does not understand, so Baidu.

If you want to use Python code to achieve the same effect, it is also possible, but you need to download a package, and enter PIP install Pycryptodome on CMD.

def to_16(data) :
    len1=16-len(data)%16
    data+=chr(len1)*len1

    return data

def encryption(data,key) :
    iv = '0102030405060708'
    aes = AES.new(key=key.encode('utf-8'),IV=iv.encode('utf-8'),mode=AES.MODE_CBC)
    data1 = to_16(data)
    bs=aes.encrypt(data1.encode('utf-8'))

    return str(b64encode(bs),'utf-8')

def get_enc(data) :
    param4 = '0CoJUm6Qyw8W8jud'
    #enc='NA5SxhePf6dxIxX7'
    #enc='GLvjERPvSFUw6EVQ'
    enc='g4PXsCuqYE6icH3R'
    first=encryption(data,param4)
    return encryption(first,enc)
Copy the code

The enC value in the code is the I value I mentioned above, and the B function I mentioned above is the encryption function inside. Then we can show the same result of the above run.Here xiaobian just need to get the ID of a song, for the subsequent preparation.Can download url is found that the songs above the small target, it is a post request, and request parameters is encrypted, we also need to get its original request parameter values, the operation and the like, here small make up not tell one by one, anyway, the more troublesome, small make up to spend a lot of time, finally got its initial request parameters as follows: {” ids “:” song [id]. “the format (song_id),” level “:” standard “, “encodeType” : “aac”, “csrf_token” : “”} encryption algorithm and the same.

4. Implement the crawl code

from Crypto.Cipher import AES
from base64 import b64encode
import requests
import random
import json
import os


def to_16(data) :
    len1=16-len(data)%16
    data+=chr(len1)*len1

    return data

def encryption(data,key) :
    iv = '0102030405060708'
    aes = AES.new(key=key.encode('utf-8'),IV=iv.encode('utf-8'),mode=AES.MODE_CBC)
    data1 = to_16(data)
    bs=aes.encrypt(data1.encode('utf-8'))

    return str(b64encode(bs),'utf-8')

def get_enc(data) :
    param4 = '0CoJUm6Qyw8W8jud'
    #enc='NA5SxhePf6dxIxX7'
    #enc='GLvjERPvSFUw6EVQ'
    enc='g4PXsCuqYE6icH3R'
    first=encryption(data,param4)
    return encryption(first,enc)

if __name__ == '__main__':
    Get a list of songs to search for
    song=input('Please enter song name :')
    param1 = {"hlpretag": "<span class=\"s-fc7\">"."hlposttag": "</span>"."s": song, "type": "1"."offset": "0"."total": "true"."limit": "30"."csrf_token": ""}
    data=json.dumps(param1)
    Encryption of request parameters
    params=get_enc(data)
    data={
        'params': params,
        #'encSecKey': 'c2bcf219b2d727ff351d8fc4e5cbb86b09c32055345c098b8a8faf9c1c8b2bc506623ffc2b45db3e72cf040c750848f4408147c881a494c99dc8596 415ce27d7b8ff7128e41a2b987bc9b78b3f4d4e0f0f5925b9ae24d99d1923a0d0c5cae5a3ebaf83c1097cfc3fd876f77582f38b79bbd03718cc562c1 5877abe9628e89ff1'
        #'encSecKey':'cd99d0f0c4210c9dfbd2fafec8640dae914f5d359e593338f699d98c0643dcc385a3889c89c98b3dcbe8f389aa91f47608ec236cd2 04adbd0236aae23125776c294f28d1753b685710e0173349e71715153e76c93a100ad682eab00033d3ebf3b5001a0046994800332cfc43445e59f28f 5e874cb1dc04482d57da9cc67f6e8e'
        'encSecKey':'bb20ee9409e57057e4d1b55e4d77c94bff4d8cbf181c467bbd3fa156e3419665c6c1e643621d5d82c128251fb85f0cb34d4f08c88407b4148924ffa 818f59a64b3814784e7e3837bad4f6f9690cb2cf721d9ea1af12c16a32a9df00be710b70ee8ed32036cc6a465b28ef43f4382cbcb4595b3121be75ec ba9171876b611b8fc'
    }
    url='https://music.163.com/weapi/cloudsearch/get/web?csrf_token='
    headerList=[   # user-agent list, used to construct random values
        "Mozilla / 5.0 (Windows NT 10.0; WOW64; Trident / 7.0; The rv: 11.0) like Gecko"."Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3314.0 Safari/537.36 SE 2.x MetaSr 1.0"."Mozilla / 5.0 (Windows; U; Windows NT 6.1;) AppleWebKit/534.12 (KHTML, like Gecko) Maxthon/3.0 Safari/534.12"."Mozilla / 5.0 (Windows NT 6.1; The rv: Gecko / 20100101 Firefox 2.0.1) / 4.0.1. ""."Mozilla / 5.0 (Windows NT 10.0; WOW64; The rv: 38.0) Gecko / 20100101 Firefox / 38.0".'the Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit / 537.36 (KHTML, Like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3861.400 QQBrowser/10.7.4313.400'
    ]
    value=random.choice(headerList)
    headers={'user-agent':value}
    response=requests.post(url=url,data=data,headers=headers)
    dict1=json.loads(response.text)
    lists=dict1['result'] ['songs']
    for i in range(len(lists)):
        print('[] {} - {} - > {}'.format(i+1,lists[i]['name'],lists[i]['ar'] [0] ['name']))
    id=int(input('please enter the number of the song you want to download :(starting from 1)'))
    song_id=lists[id-1] ['id']
    song_name=lists[id-1] ['name'] +"_"+lists[id-1] ['ar'] [0] ['name']   # Song title

    # The following code is the code to download the song

    url2='https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token='
    param2= {"ids": "[the] {}".format(song_id), "level": "standard"."encodeType": "aac"."csrf_token": ""}  Parameters for the post request
    data2=json.dumps(param2)
    params=get_enc(data2)
    data['params']=params
    headers['user-agent']=random.choice(headerList) response2=requests.post(url=url2,data=data,headers=headers) dict2=json.loads(response2.text)  downloadUrl=dict2['data'] [0] ['url']
    downloadDir='./ netease Cloud Music '
    try:   # Create folders automatically
        os.mkdir(downloadDir)
    except Exception as e:
        print(e)

    Download the song code
    response3=requests.get(url=downloadUrl,headers=headers)
    Write to a file in binary form
    with open(file='{}/{}.mp3'.format(downloadDir,song_name),mode='wb') as f:
        f.write(response3.content)
Copy the code

Running results:After downloading, you can find that there is a netease Cloud Music folder under the same directory as the running file, and the downloaded music is in this folder.

5. To summarize

Xiaobian feel that some of the above words are not very clear, I hope readers can have questions in the comments section below to comment, of course, if readers are interested, you can look at the other articles of xiaobian.

1.Python automation: Automatic reply to QQ messages

2.Use Java to make a music player software of their own

3.Python crawlers often fail to crawl data, so you might want to check out this article

The Selenium module is so powerful that netease Cloud Music can be downloaded.

5.Python multithreaded crawler teaches you how to download emojis quickly, and say no to the worry that you don’t win!

6.The heroes on King of Glory wallpapers are so cool, why not download them?