For, as retreat webs. Step by step, we’ll help you make your first word cloud in Python from scratch. Welcome to try!
demand
In the era of big data, you can often see some beautiful infographics in the media or on websites.
Like this.
Or something like this.
How did you feel after watching it? Do you want to make your own?
If your answer is yes, let’s not delay, today step by step to make a word cloud analysis map from scratch. Of course, as the basis of the word cloud, certainly not as cool as those two infographics. But never mind, well begun is half done. You’ll be able to upgrade your skills later and get on your way to success.
There are many online tutorials that teach you how to make infographics. Many use specialized tools. These tools are great, convenient and powerful. It’s just that they’re too specialized and have limited scope. Today we’re going to try to make a word cloud using Python, the universal programming language.
Python is a popular programming language today. Not only can you use it for data analysis and visualization, but you can also use it to build websites, crawl data, do math problems, write scripts for laziness…
Do you know douban? It was originally written in Python.
Python is currently the fourth most popular programming language (of course, many people disagree, so there are a lot of lists for programming languages, you know). But we should look at the problem from the perspective of development. Python has a tendency to explode as data science evolves. It’s good to get on the air early.
If you have no prior programming background, that’s fine. Starting from scratch means I’ll teach you how to install the Python runtime environment and work your way through the word cloud. I hope you don’t just browse, but try it yourself. By the time you’re done, not only will you have created your first word cloud, but it will be your first useful piece of programming.
A crush? Let’s get started.
The installation
First, we need to install the Python runtime environment.
If you’re using macOS, you actually have Python pre-installed on your system.
However, we will use many of the features of the extension pack. It is therefore best to install a Python toolset. After a single installation, most of the functionality is integrated. You don’t have to install a new package piecemeal every time you use a new feature.
There are many Python packages, and anaconda is recommended. After more than 4 years of trial and comparison, I feel that the installation of this software package is more convenient, and the coverage and structure of the expansion package are more reasonable.
Please download the Anaconda package from this website. Click down to find the download location. Choose the appropriate version based on your operating system type.
Because my system is macOS, the website directly recommended the macOS version to me. But if you’re running Windows or Linux, switch to the appropriate TAB.
No matter which operating system you’re running, note the two buttons on the right, which correspond to Python 2.X and 3.X versions respectively. Why should I use the old one when there is a new one?
That’s not true. Until 2020, both versions of Python will exist side by side. The Python developers really want people to upgrade to version 3.x. Unfortunately, the 3.x version currently has fewer extensions than the 2.x version, especially when it comes to data science packages. So if you are a beginner, I recommend that you download version 2.x (currently 2.7) so that you will have fewer problems in future use. Once you’re comfortable with it, you can migrate to version 3.x. Trust me, you’ll get used to the new version in no time.
Once downloaded, just execute the installation file.
Depending on the speed of your computer, the installation time varies. Be patient. It only takes this once.
Once installed, install a “modern” browser. If you’re using macOS, Safari is fine. Other options include Firefox and Google Chrome.
Install one of the above browsers, and then set it as the system default browser.
Ok, now go to the command line mode.
On macOS and Linux, you need to enable a terminal.
For Windows, open “Start” – “Attachments” – “Command prompt”.
Type the following command:
mkdir demo
cd demo
Copy the code
Well, you now have a dedicated directory called Demo. Go to macOS Finder or My Computer in Windows, find this directory and open it.
Back at the terminal, a macOS or Linux user should type the following command:
pip install wordcloud
Copy the code
MacOS will prompt you to install the XCode command line tool first, and you should follow the default Settings step by step. But please note that it must be installed in WiFi environment. If you’re using 4G data, that’s going to cost you a fortune.
If you use Windows, then in order to use the wordcloud package, a little more trouble, you need to go here to download wordcloud 1.3.1 cp27 cp27m win32.whl this file. Download it and drag it into your demo directory.
On the command line, run the following command:
pip install wheel
Copy the code
Then, execute:
PIP install wordcloud ‑ 1.3.1 ‑ cp27 ‑ cp27m ‑ win32. WHLCopy the code
Well, all the Python runtime environments we need are finally installed.
Be sure to follow the above steps to ensure that each step has been successfully completed. Otherwise, once the omission, the following running program will report an error.
data
The object of word cloud analysis is text.
In theory, texts can be in any language. English, Chinese, French, Arabic…
For simplicity, let’s use the English text as an example. You can go to the Internet to find an English article as the subject of analysis. I’m a big fan of the British TV series “Yes, minister”, so I went to Wikipedia to find its introduction.
I copied the text and stored it in a text file called yes-minister.txt.
Move this file to our working directory, Demo.
Ok, the text data is ready. Enter the magic world of programming!
code
On the command line, run:
jupyter notebook
Copy the code
The browser automatically starts and the following page is displayed.
This is the result of our labor just now – the installation of a good operating environment. We haven’t written a program yet, and there’s only one text file in the directory that we just generated.
Open the file and browse the contents.
Go back to the main page of Jupyter Notebook. Let’s click on the New button to create a New Notebook. Notebooks, choose The Python 2 option.
We will be prompted for the name of the Notebook. The name of the program code file is optional. But I suggest you pick a name that makes sense so you can look it up in the future. Since we’re going to try word clouds, let’s call it wordcloud.
Then there was a blank notebook for us to use. We enter the following three statements in the unique code text box on the web page. Be sure to enter the sample code verbatim, with no difference in the number of Spaces. Pay particular attention to the third line, starting with 4 Spaces, or 1 Tab. After entering, press Shift+Enter to execute.
filename = "yes-minister.txt"
with open(filename) as f:
mytext = f.read()
Copy the code
Nothing came of it.
Yes, because we don’t have any output action here, it just opens your yes-prim. TXT file, reads it out, and stores it in a variable called myText.
Then we try to display the contents of MyText. After entering the following statement, you still have to press Shift+Enter for the system to actually execute the statement.
mytext
Copy the code
In the following steps, do not forget this confirmation execution action.
The result is shown in the following figure.
Well, it looks like the text stored in the myText variable is the text we pulled from the web. So far, so good.
We then call (import) the word cloud package to make the word cloud using the text content stored in MyText.
from wordcloud import WordCloud
wordcloud = WordCloud().generate(mytext)
Copy the code
The program may then raise an alarm. Don’t worry about it. Warning Does not affect normal running of the program.
At this point the word cloud analysis is complete. You read that right, the core step of creating a word cloud is just these two lines, and the first one is just getting outside help from the extension pack. But the program doesn’t show us anything.
Where’s the agreed word cloud? All this time, and nothing. Are you lying? !
Don’t get excited. Enter the following four lines and you are ready for a miracle.
%pylab inline
import matplotlib.pyplot as plt
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
Copy the code
The running result is as shown in the figure:
Don’t get so excited.
You can right-click the word cloud image and export it with the “Save as” function.
In this word cloud, we can see the difference in frequency between different words and phrases. High-frequency words are written in significantly larger fonts and in bold colors. It’s worth noting that the most prominent word Hacker doesn’t refer to a Hacker, but to one of the show’s main characters, Prime Minister Hacker.
The ipynb file, which contains the complete code for the program, is also shared and can be downloaded here.
Wish you all the best in your attempt. Are you satisfied with your word cloud? If you’re not satisfied, you can explore other advanced features of the WordCloud package. Give it a try and see if you can make a word cloud like this.
discuss
What kind of word cloud have you made after learning this method? In addition to the methods described in this article, what other ways do you know of to easily create word clouds or other infographics? Please leave a comment and share with us. We discuss with each other.
If you like, please give it a thumbs up. You can also follow and top my official account “Nkwangshuyi” on wechat.
If you’re interested in data science, check out my series of tutorial index posts entitled how to Get started in Data Science Effectively. There are more interesting problems and solutions.