PyCharm is a Python IDE that comes with a set of tools to help users become more productive when developing in Python. In addition, the IDE provides some advanced functionality for professional Web development within the Django framework.
Hi, how do you do? Did you try to integrate Into The Notebooks of PyCharm 2019.2? Give it a try! In this blog post, we’ll explore some of the data using the PyCharm and its Jupyter Notebook integration. First, we need the data. Whenever I need a new data set to play with, I usually head to Kaggle, and I’m sure to find some interesting topics. This time, a dataset called “Pizza Restaurants and the Pizzas They Sell” caught my attention. Who doesn’t love pizza? Let’s analyze these pizzerias and try to learn a thing or two.
Since this data is not part of any of my existing PyCharm projects, I will create a new project.
Make sure to use PyCharm Professional, Community Edition does not include Jupyter Notebook integration.
Tip: When using Jupyter Notebooks in the browser, I tend to create multiple temporary Notebooks for experiments. Creating a PyCharm project for each project is quite tedious, so instead you can have a project for such an experiment.
I like my things to be in order, so once the project is created, I will add some structure – I will move the downloaded data set to the data directory, as well as another directory in the notebook.
Once I had created my first Pizza.ipynb notebook, PyCharm suggested installing the Jupyter package and providing a link in the upper right corner to do so.
Once the Jupyter package is installed, we are ready!
The first thing 90% of data scientists do in their Jupyter notebooks is type import Pandas as PD. At this point, PyCharm suggests installing PANDAS in the venv with a single click:
Once we have pandas installed, we can read the CSV data into pandas DataFrame:
df = pd.read_csv(".. /data/Datafiniti_Pizza_Restaurants_and_the_Pizza_They_Sell_May19.csv")Copy the code
To execute this cell, press Shift + Enter, or click the green arrow icon in the binding line next to the cell.
When you run the cell for the first time, PyCharm will start a local Jupyter server to execute the code in it – you do not need to do this manually from the terminal.
First, we’ll look at the basics of the data set – how many rows does it have? What are columns? What does the data look like?
I suspect this data only contains information about RESTAURANTS in the United States. To confirm this, let’s calculate the value in the country/region column:
Yes, the only country present in this dataset is the United States, so it is safe for country to remove the column completely. Similarly, menus.currency and priceRangeCurrency are also worth the same – dollar. I would also give up menuPageURL because it doesn’t add much value to the analysis, key because it copies information from other columns (country, state, city, etc.).
Another cleanup I’ll do here is rename the province column, states because it makes more sense in this context, and replace the state acronyms with the state’s full name for better readability.
Once we’re done cleaning up the data, how do we plot it? As humans, we’re better at presenting information visually.
First, let’s look at our most common pizza types in this data set. Given the subject matter, it’s appropriate to think of it as a pie with a Matplotlib
It’s not our pie chart yet. To make it display, I need %matplotlib inline to add the magic command for IPython, while I’m in it, I’ll add another magic command to let IPython know to render the image properly for the retina screen.
I could add these lines to the same cell and run it again, but I prefer to define this type of magic command at the very beginning of my notebook.
To navigate to the beginning of the notebook, you can use Cmd+[(Ctrl+Alt+Left on Windows). Inserting A new cell is as easy as typing #%% (if you prefer the shortcut to insert A cell above the current cell, Option+Shift+Aon MAC, or Alt+Shift+A on Windows). Now all I need to do is add the magic command and run all the cells below:
From the pie chart, we now know that the most common type of pizza is the cheese pizza, followed closely by the white pizza.
How was the restaurant? We have their geographic location in the data set, so we can easily see where they are.
Each restaurant has a unique ID, and the dataset can have multiple entries, each representing the pizza in that restaurant’s menu. So, to draw restaurants instead of pizza, we need to group the entries by restaurant ID.
Now we can plot them on a map. For geographic mapping, I like to use plot. Be sure to get the latest version of it (4.0.0) to render the graphics output well in PyCharm.
One can think of some questions we could try to answer with this data set, e.g., which city has the most/cheapest Veggie Pizzas? Or what is the most common chain of pizza restaurants? If you want to play with the dataset and answer these or other questions, you can collect the data and run your own analysis. Please remember that if you want to try PyCharm, make sure you are using PyCharm 2019.2 Pro.