Disclaimer: This article is only for study and research, prohibited for illegal use, otherwise the consequences, such as infringement, please inform to delete, thank you!

Project scenario:

IIn the previousThe article wrote the decryption of netease cloud parameters, this time we come to climbDrawing the babyAll comments.

Solution:


1. Take a look at the interface of commentshttps://music.163.com/weapi/comment/resource/comments/get?csrf_token=ff57cff46ebe79b9a51dd10f8c9181bbThis interface also carries two encryption parameters, params and encSecKey. The decryption method of these two parameters is the same as that of the decryption method of the song interface.



2. The encryption parameters of the comment interface params and encSecKey generation mode are directly posted here. There are several parameters that need to be paid attention to, one is cursor and the other is pageSize. And then the pagesize for each page is 20
{"rid":"R_SO_4_1474342935"."threadId":"R_SO_4_1474342935"."pageNo":"3"."pageSize":"20"."cursor":"1600190813154"."offset":"0"."orderType":"1"."csrf_token":"ff57cff46ebe79b9a51dd10f8c9181bb"}
Copy the code



3. Then the problem comes. I don’t know how long the last comment time of each page is, and I can’t generate the corresponding cursor. Cursor: 1598543999000; cursor: 2020-08-27 23:59:59; cursor: 1598543999000; Then set pagesize=1000 (1000 is the maximum value, orderType=1 (by time)) and use the corresponding method to avoid getting repeated comments every day. In fact, the number of comments obtained in this way is not complete, you can subdivide the comments of the day according to the page and offset to obtain the complete comment information. If you have other good ideas, you can comment!
param = {"rid": "R_SO_4_" + song_id, "threadId": "R_SO_4_" + song_id, "pageNo": "1"."pageSize": "1000"."cursor": cursor, "offset": "0"."orderType": "1"."csrf_token": "ff57cff46ebe79b9a51dd10f8c9181bb"}
Copy the code


4. After analysis, we can obtain most of the comments of the song, the code is as follows:
    now_day = datetime.date.today() # Date of the day
    flag_info = None # Repeat the comment flag
    num = 0
    for i in range(20, -1, -1) :# retrieve date 2020-08-27-- 2020-09-16
        pre_day = str(now_day - datetime.timedelta(days=i)) + "23:59:59"  Get T+1 date
        First convert to time array
        timeArray = time.strptime(pre_day, "%Y-%m-%d %H:%M:%S")
        # convert to timestamp
        cursor = str(int(time.mktime(timeArray))) + '000'  # splice into a 13-bit timestamp
        print(pre_day, cursor)
        # comment interface parameters
        param = {"rid": "R_SO_4_" + song_id, "threadId": "R_SO_4_" + song_id, "pageNo": "1"."pageSize": "1000"."cursor": cursor, "offset": "0"."orderType": "1"."csrf_token": "ff57cff46ebe79b9a51dd10f8c9181bb"}
        pdata = js_tool.call('d'.str(param))
        response = requests.post('https://music.163.com/weapi/comment/resource/comments/get', headers=headers,data=pdata)
        # Get comment info
        data = json.loads(response.text)['data']
        comments = data.get('comments')
        # Store comment information
        with open('comments.txt'.'a', encoding='utf8') as f:
            for comment in comments:
                info = comment.get('content')
                if flag_info == info:  # repeat comments will break out of the loop to prevent repeat comments
                    break
                print(info)
                f.write(info + '\n')
                # folow_comments = comment.get('beReplied') # folow_comments = comment.get('beReplied'
                # if folow_comments:
                # for folow_comment in folow_comments:
                # print(folow_comment.get('content'))
                num += 1  # Get comments +1
        flag_info = comments[0] ['content']  Take the first request of each request
        print('First request per request', flag_info, '\n')
    print('Get comments:', num)
Copy the code


5. Then we get the comment data and use Jieba to make word segmentation statistics on the data and output word cloud map:
# participle
def fc_CN(text) :
    # accept a string of participles
    word_list = jieba.cut(text)
    Add Spaces between individual items after the # participle
    result = "".join(word_list)
    return result

# Output cloud words
def word_cloud() :
    with open("./comments.txt", encoding='utf8') as fp:
        text = fp.read()
        # Word segmentation of the Read Chinese document
        text = fc_CN(text).replace('\n'.' ').split(' ')
        # filter partial participles
        filter_str = ['the'.', '.'了'.'我'.'['.'you'.'is'.'it'.'] '.'! '.'. '.'? '.'it'.'no'.'or'.'all'.'it'.'ah'.'in'.'it'.'and'.'it'.'listen'.'有'.'said'.'to'.'good'.'people'.'to'.'he'.'... '.'small'.'to'.'and'.'no'.'一'.' ']
        new_text = []
        for data in text:
            if data not in filter_str:
                new_text.append(data)
        print(new_text)
        # Word frequency statistics
        word_counts = collections.Counter(new_text)  # Do word frequency statistics for participles
        word_counts_top10 = word_counts.most_common(10)  Get the top 10 words with the highest frequency
        print(word_counts_top10)  # output check

        # Word frequency display
        mask = np.array(image.open('./love.jpg'))  # Define word frequency background - need to import
        wc = wordcloud.WordCloud(
            # background_color='white', # set the background color
            font_path='C:\Windows\Fonts\simhei.TTF'.Set the font format
            mask=mask,  # Set the background image
            max_words=200.# display maximum number of words
            max_font_size=300.# maximum font size
            # Scale =32 # Adjust the image sharpness, the larger the value, the clearer it is
        )

        wc.generate_from_frequencies(word_counts)  Generate word clouds from dictionaries
        image_colors = wordcloud.ImageColorGenerator(mask)  Create a color scheme from the background image
        wc.recolor(color_func=image_colors)  # Set the word cloud color to the background image scheme
        wc.to_file("./tmp.jpg")  Output the image as a file
        plt.imshow(wc)  # Display word cloud
        plt.axis('off')  # Close the axes
        plt.show()  # display image

Copy the code


5. 6. Smart water bottle “, the total number of comments was 8,544, and the number climbed to 8,230, with hundreds still missing.




7. Then look at our output of the word cloud, ha ha ha, capital GIAO word came into view, worthy of my Giao brother! The complete code can be obtained by accessing my Git:Github.com/934050259/w…