preface
PK Creative Spring Festival, I am participating in the “Spring Festival Creative Submission Contest”, please see: Spring Festival Creative Submission Contest “,
Yesterday saw the Nuggets of the Spring Festival essay activities, recently just look at the crawler, climb to take the Spring Festival greetings, generate a word cloud to play, we are interested in can try, will be presented with the source code, very simple. The renderings are as follows
The environment
-
Environment: Windows,
-
Language: Python, python version 3.7
-
Dependent third-party packages:
Selenium —- crawl site, collect greetings, this library is expected to be common for UI automated testing, I do not use the Requests library to crawl, the advantage of using this library is that the page is visible in real time during the crawl process
Wordcloud – Used to generate word clouds
PIL– make the word cloud grow into the desired contours. Note here that python3.7 should be installed using PIP install pillow
Numpy – to generate the contourless word cloud, you also need to install this package, which represents the image of the given shape as a large matrix
Jieba — the default word created by word cloud is English, because we climb the blessing language is Chinese, need to use this library to identify Chinese, prevent Chinese garbled
Above interest, you can deepen your understanding of these libraries
Train of thought
(1) I crawled Baidu, search about the Spring Festival blessings, and then put these blessings into a file. Details are as follows:
Selenium WebDriver is used here. The Firefox browser is used. Create a Firefox browser object
On this page, I simulated manually clicking the first search result to jump to other web pages, as shown in the picture
Get all the greetings from this page and save them in wishes. TXT
(2) Then parse the file using the relevant library to generate the word cloud, and generate the word cloud. Note here that the Chinese font used to generate the word cloud, font_path uses the Windows font library, here you can change the font
word_cloud = WordCloud(mask=mask, font_path='C:\Windows\Fonts\STXINGKA.TTF').generate(text)
Copy the code
Windows font library
The source code
note
You can modify the background color and title color, for example
word_cloud = WordCloud(mask=mask, background_color='white', contour_color='red', colormap='brg',
max_words=600,
font_path='C:\Windows\Fonts\STXINGKA.TTF').generate(text)
Copy the code
After re-running, see figure
Support the colormap font color set, refer to the following link matplotlib.org/2.0.2/examp…