This is the 18th original article on learning Python every day
This time use the Python crawler to crawl something fun
These two days watch the NBA finals happen to be, you are the fans will not miss, not to mention this year’s Western Conference finals is the Rockets versus warriors, this year’s rockets are very strong, because not to the critical time always someone to stand up. Of course, the Warriors are also very strong, after all, can not look down on Curry Durant and other four giants.
I do not know the eastern conference finals, always thought the Celtics will struggle against the Cavaliers, who knew the absence of two major Celtics or strong, but also played the Cavaliers 2-0, it seems that the Cavaliers will be unlucky, do not know the Celtics will succeed in revenge, let us wait and see!
Whenever there is a live broadcast, there will always be comments, so I want to scroll down the fan comments and see what they are talking about!
The preparatory work
Libraries to use:
Requests: for network requests
Jieba: Used to divide words
Wordcloud: Make word clouds
Numpy: Make background images
Word cloud background picture:
The above libraries can be downloaded directly with PIP, but WordCloud will report an error.
We need to download the WHL file from the official website for manual installation
Website: https://www.lfd.uci.edu/~gohlke/pythonlibs/
Then find your own version of Python and download it
Finally in the command line installation can be
PIP install "file path + WHL"
Next look for the target page
Text broadcast address: https://www.zhibo8.cc/zhibo/nba/2018/0517123898.htm?redirect=zhibo
The link below is to return the comment information, and it is json
The link is: https://cache.zhibo8.cc/json/2018/nba/0517123898_384.htm? Key = 0.6512348313080727
Through multiple analyses, we know that the information in bold above is the information of the live broadcast room, the number of comment pages after the underline, and the key parameter at the end is a random number. It doesn’t matter whether you take it or not for request
Use code to get comment information
def __get_json(self, index):
url = 'https://cache.zhibo8.cc/json/2018/nba/0517123898_%d.htm? Key = 0.1355540028791382 ' % index
response = requests.get(url)
if response.status_code == 200:
for item in response.json():
Write file
self.__write_file(item['content'])
self.num += 1
return 1
else:
return 0Copy the code
Now that I have the comments, let’s get a word cloud
def __get_wordcloud(self):
with open('comments.txt'.'r', encoding='utf-8') as comments:
text = comments.read() # load data
words = ' '.join(jieba.cut(text, cut_all=True)) # Stutter complete participle mode
image = np.array(Image.open('1.jpg')) # Background image
Initialize the word cloud
wc = WordCloud(font_path=r'C:\Windows\Fonts\simkai.ttf'.
background_color='white', mask=image,
max_font_size=100, max_words=2000)
wc.generate(words) # Generate word clouds
wc.to_file('img.png') # generate images
image_file = Image.open('img.png') # Open image
image_file.show()Copy the code
Ok, code done, look at the effect:
Using the word cloud can be seen at a glance what the fans are in the comments, because I crawl is the rocket second game at home against the warriors, must discuss is the mighty rockets, immediately is Kevin durant, the god of death, the durant super god took 38 points or lost to the rockets, are discussing his most naturally. Then there was tucker, who made 5 of 6 3-pointers and broke his playoff high for points, and it was natural to talk about him. There is also a very conspicuous third quarter, many people think the Warriors are “brave three crazy”, think this game warriors will explode in the third quarter? In fact, the rockets are strong in the third quarter this season, no weaker than the Warriors.
The complete code has been uploaded to my Github. If necessary, you can check it by yourself. If you think the program is good, I hope you can give me a star!
github:https://github.com/SergioJune/gongzhonghao_code
Write in the last
If this article is useful to you, I hope you give it a thumbs up! Like and forward is the biggest support for me, so as to have the motivation to output high-quality original articles.
“Give me a thumbs up for the fans? I’ll see how many are fans.”
Recommended articles:
Simple crawler exercise using Requests +BeautifulSoup
Python crawler library for requests
Daily learning python
Code is not just buggy, it’s beautiful and fun