This is the 18th original article on learning Python every day

This time use the Python crawler to crawl something fun

These two days watch the NBA finals happen to be, you are the fans will not miss, not to mention this year’s Western Conference finals is the Rockets versus warriors, this year’s rockets are very strong, because not to the critical time always someone to stand up. Of course, the Warriors are also very strong, after all, can not look down on Curry Durant and other four giants.

I do not know the eastern conference finals, always thought the Celtics will struggle against the Cavaliers, who knew the absence of two major Celtics or strong, but also played the Cavaliers 2-0, it seems that the Cavaliers will be unlucky, do not know the Celtics will succeed in revenge, let us wait and see!

Whenever there is a live broadcast, there will always be comments, so I want to scroll down the fan comments and see what they are talking about!

The preparatory work

Libraries to use:

Requests: for network requests

Jieba: Used to divide words

Wordcloud: Make word clouds

Numpy: Make background images

Word cloud background picture:

The above libraries can be downloaded directly with PIP, but WordCloud will report an error.

We need to download the WHL file from the official website for manual installation

Website: https://www.lfd.uci.edu/~gohlke/pythonlibs/

Then find your own version of Python and download it

Finally in the command line installation can be

PIP install "file path + WHL"

Next look for the target page

Text broadcast address: https://www.zhibo8.cc/zhibo/nba/2018/0517123898.htm?redirect=zhibo

The link below is to return the comment information, and it is json

The link is: https://cache.zhibo8.cc/json/2018/nba/0517123898_384.htm? Key = 0.6512348313080727

Through multiple analyses, we know that the information in bold above is the information of the live broadcast room, the number of comment pages after the underline, and the key parameter at the end is a random number. It doesn’t matter whether you take it or not for request

Use code to get comment information

def __get_json(self, index):

       url = 'https://cache.zhibo8.cc/json/2018/nba/0517123898_%d.htm? Key = 0.1355540028791382 ' % index

       response = requests.get(url)

       if response.status_code == 200:

           for item in response.json():

               Write file

               self.__write_file(item['content'])

               self.num += 1

           return 1

       else:

           return 0Copy the code

Now that I have the comments, let’s get a word cloud

def __get_wordcloud(self):

       with open('comments.txt'.'r', encoding='utf-8'as comments:

           text = comments.read()  # load data

           words = ' '.join(jieba.cut(text, cut_all=True))  # Stutter complete participle mode

           image = np.array(Image.open('1.jpg'))  # Background image

           Initialize the word cloud

           wc = WordCloud(font_path=r'C:\Windows\Fonts\simkai.ttf'.

                          background_color='white', mask=image,

                          max_font_size=100, max_words=2000)

           wc.generate(words)  # Generate word clouds

           wc.to_file('img.png')  # generate images

           image_file = Image.open('img.png')  # Open image

           image_file.show()Copy the code

Ok, code done, look at the effect:

Using the word cloud can be seen at a glance what the fans are in the comments, because I crawl is the rocket second game at home against the warriors, must discuss is the mighty rockets, immediately is Kevin durant, the god of death, the durant super god took 38 points or lost to the rockets, are discussing his most naturally. Then there was tucker, who made 5 of 6 3-pointers and broke his playoff high for points, and it was natural to talk about him. There is also a very conspicuous third quarter, many people think the Warriors are “brave three crazy”, think this game warriors will explode in the third quarter? In fact, the rockets are strong in the third quarter this season, no weaker than the Warriors.

The complete code has been uploaded to my Github. If necessary, you can check it by yourself. If you think the program is good, I hope you can give me a star!

github:https://github.com/SergioJune/gongzhonghao_code

Write in the last

If this article is useful to you, I hope you give it a thumbs up! Like and forward is the biggest support for me, so as to have the motivation to output high-quality original articles.

“Give me a thumbs up for the fans? I’ll see how many are fans.”

Recommended articles:

Simple crawler exercise using Requests +BeautifulSoup

Python crawler library for requests

Daily learning python

Code is not just buggy, it’s beautiful and fun