For those of you who like to work on projects in Python, this is a common question: What’s a nice and useful visualization kit to use when making diagrams? When a beautiful chart has appeared in previous articles, readers have always left comments in the background asking what tools were used to make the chart. Below, the author introduces eight visual toolkits implemented in Python, some of which are available in other languages as well. Come and try which one do you like?

From Medium, by Aaron Frederick. Compiled by Shimeng Li and Shuting Wang.

There are many ways to create graphics in Python, but which is the best? Before we do visualization, we need to be clear about the objectives of the image: do you want to get a first look at the distribution of the data? Want to impress people with your presentation? Maybe you want to show someone an inner image, an image of moderation?

This article introduces some of the commonly used Python visual packages, including their advantages and disadvantages, and what scenarios they are suitable for. This article only extends to 2D graphics, leaving some room for the next time to cover 3D graphics and dashboard, but many of the packages I’ll cover this time have good support for BOTH.

Matplotlib, Seaborn, and Pandas

There are several reasons to put these three packages together: First, Seaborn and Pandas are built on top of Matplotlib. When you use df.plot() in Seaborn or Pandas, you are actually using code that someone else has written in Matplotlib. As a result, the diagrams are similar in terms of beautification, and the syntax used to customize the diagrams is very similar.

When I think of these visualization tools, three words come to mind: Exploratory, Data, and Analysis. These packages are great for exploring data for the first time, but not enough for a demo.

Matplotlib is a relatively low-level library, but it supports an incredible level of customization (so don’t simply exclude it from the demo package!). “But there are other tools that are better suited for presentations.

Matplotlib also has style selection, which emulates popular beautification tools like GGploT2 and XKCD. Here is a sample diagram I made using Matplotlib and related tools:

When working with salary data for basketball teams, I wanted to find the team with the highest median salary. To illustrate the results, I color-coded the salaries for each team as a bar chart to show which team the players would be better off joining.

import seaborn as snsimport matplotlib.pyplot as pltcolor_order = ['xkcd:cerulean'.'xkcd:ocean'.'xkcd:black'.'xkcd:royal purple'.'xkcd:royal purple'.'xkcd:navy blue'.'xkcd:powder blue'.'xkcd:light maroon'.'xkcd:lightish blue'.'xkcd:navy']sns.barplot(x=top10.Team,            y=top10.Salary,            palette=color_order).set_title('Teams with Highest Median Salary')plt.ticklabel_format(style='sci', axis='y', scilimits = (0, 0))Copy the code

The second diagram is a Q-Q diagram of the residual of the regression experiment. The main purpose of this drawing is to show how to make a useful drawing with as few lines as possible, although it may not be as beautiful.

import matplotlib.pyplot as pltimport scipy.stats as stats#model2 is a regression modellog_resid = model2.predict(X_test)-y_teststats.probplot(log_resid, dist="norm", plot=plt)plt.title("Normal Q-Q plot")plt.show()Copy the code

Matplotlib and its related tools turned out to be very efficient, but they weren’t the best tools for demonstration purposes.

ggplot(2)

You might ask, “Aaron, ggplot is the most common visualization package in R, but aren’t you writing Python packages?” . People have implemented GGploT2 in Python, copying everything from the beautification to the syntax of the package.

From all the material I’ve looked at, it looks pretty much like GGploT2 in everything, but the nice thing about this package is that it relies on the Pandas Python package. However, the Pandas Python package has recently deprecated some methods, resulting in incompatible Python versions.

If you want to use a real ggplot in R (which looks, feels, and syntax the same except for dependencies), I discussed this in another article.

That said, if you must use ggplot in Python, you will have to install version 0.19.2 of Pandas, but I recommend that you do not reduce the version of Pandas to use the lower level plotting package.

Ggplot2 (and, I think, Python’s ggplot as well) is important because they use “graph syntax” to build images. The basic premise is that you can instantiate the graph and add different features separately; That is, you can embellish headings, axes, data points, trendlines, etc.

Here is a simple example of the ggplot code. We instantiate the graph with ggplot, set the beautification properties and data, and then add points, themes, and axes and title tags.

#All Salariesggplot(data=df, aes(x=season_start, y=salary, colour=team)) + geom_point() + theme(legend.position="none") + labs(title = 'Salary Over Time', x='Year', y='Salary ($)')Copy the code

Bokeh

Bokeh is very beautiful. Conceptually, Bokeh is similar to ggplot in that it uses a graphical syntax to build images, but Bokeh has a user-friendly interface for both professional graphics and business reports. To illustrate this point, I wrote the code for making the histogram based on the 538 Masculinity Survey data set:

import pandas as pdfrom bokeh.plotting import figurefrom bokeh.io import show# is_masc is a one-hot encoded dataframe of responses to the question:# "Do you identify as masculine?" #Dataframe Prepcounts = is_masc.sum()resps = is_masc.columns#Bokehp2 = figure(title='Do You View Yourself As Masculine? X_range =list(resps), x_range=list(resps), top=counts, width=0.6, fill_color='red', line_color='black')show(p2)#Pandascounts.plot(kind='bar')Copy the code

Use Bokeh to represent the survey results

The red bar chart shows 538 people asking questions like “Do you consider yourself manly? The answer to this question. Between 9 and 14 lines of Bokeh code builds an elegant and professional response count histogram — reasonable font size, Y-axis scale, format, etc.

Most of the code I wrote was for marking axes and headings, and adding color and borders to bar charts. When making beautiful and expressive images, I prefer to use Bokeh — it has done a lot of beautifying for us.

Pandas represents the same data

The blue diagram is line 17 above. The values of these two histograms are the same, but for different purposes. In exploratory Settings, it’s easy to write a line of code to view the data in Pandas, but Bokeh’s beautification is powerful.

All the conveniences provided by Bokeh are customizable in Matplotlib, including the Angle of the X-axis label, the background line, the Y-axis scale, and the font (size, italic, bold). The figure below shows some random trends, which are more customized: using legends and different colors and lines.

Bokeh is also a great tool for creating interactive business reports.

Plotly

Plotly is very powerful, but it takes a lot of time to set up and create graphics, and it’s not intuitive. After spending most of the morning working with Plotly, I barely made anything and went straight to dinner. I only created a bar chart with no coordinate labels and a “scatter chart” with no lines to delete. Ploty comes with a few caveats.

  • Install the API key and register it, not just PIP;

  • The data and layout objects Plotly draws are unique, but not intuitive;

  • Image layouts don’t work for me (40 lines of code makes no sense!)

But it has its advantages, and there are solutions to all the disadvantages of the setup:

  • You can edit images on the Plotly website and in the Python environment;

  • Support for interactive images and business reports;

  • Plotly works with Mapbox to customize maps;

  • It has the potential to do great graphics.

Here’s the code I wrote for this package:

#plot 1 - barplot# **note** - the layout lines do nothing and trip no errorsdata = [go.Bar(x=team_ave_df.team,              y=team_ave_df.turnovers_per_mp)]layout = go.Layout(    title=go.layout.Title(        text='Turnovers per Minute by Team',        xref='paper',        x=0    ),    xaxis=go.layout.XAxis(        title = go.layout.xaxis.Title(            text='Team',            font=dict(                    family='Courier New, monospace',                    size=18,                    color='#7f7f7f'                )        )    ),    yaxis=go.layout.YAxis(        title = go.layout.yaxis.Title(            text='Average Turnovers/Minute',            font=dict(                    family='Courier New, monospace',                    size=18,                    color='#7f7f7f'                )        )    ),    autosize=True,    hovermode='closest')py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot', sharing='public', fileopt='overwrite')#plot 2 - attempt at a scatterplotdata = [go.Scatter(x=player_year.minutes_played,                  y=player_year.salary,                  marker=go.scatter.Marker(color='red',                                          size=3))]layout = go.Layout(title="test",                xaxis=dict(title='why'),                yaxis=dict(title='plotly'))py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot2', sharing='public')[Image: image.png]Copy the code

Bar chart showing average turnovers per minute for different NBA teams.A scatter diagram showing the relationship between salary and playing time in the NBA

Overall, out-of-the-box beautification tools look good, but my repeated attempts to copy documents verbatim and modify axis labels have failed. But here’s a look at Plotly’s potential and why I’d spend hours with it:



Some sample diagrams on the Plotly page

Pygal

Less well known is Pygal, which, like other commonly used drawing packages, uses the graphical framework syntax to build images. This is a relatively simple drawing package because the drawing object is relatively simple. Using Pygal is very simple:

  • Instantiate pictures;

  • Format with the image target attribute;

  • Add data to the image using figure.add().

The main problem I encountered with Pygal was image rendering. I have to use the render_to_file option and then open the file in a Web browser to see what I just built.

It’s worth it in the end because the images are interactive, and have satisfying and customizable beautification features. All in all, the package looks good, but the file creation and rendering parts are a bit cumbersome.

Networkx

Although Networkx is based on Matplotlib, it is still an excellent solution for graph analysis and visualization. Graphics and networks are not my area of expertise, but Networkx is a quick and easy way to graphically represent connections between networks. Here are the different representations I built for a simple graph, along with some code downloaded from Stanford SNAP (about drawing a small Facebook network).

I color-coded each node by number (1~10) as follows:

options = {    'node_color' : range(len(G)),    'node_size' : 300,    'width' : 1,    'with_labels' : False,    'cmap' : plt.cm.coolwarm}nx.draw(G, **options)Copy the code

The code for visualizing the sparse Facebook graphics mentioned above is as follows:

import itertoolsimport networkx as nximport matplotlib.pyplot as pltf = open('data/facebook/1684.circles'.'r')circles = [line.split() for line in f]f.close()network = []for circ in circles:    cleaned = [int(val) for val in circ[1:]]    network.append(cleaned)G = nx.Graph()for v in network:    G.add_nodes_from(v)edges = [itertools.combinations(net,2) for net in network]for edge_group in edges:    G.add_edges_from(edge_group)options = {    'node_color' : 'lime'.'node_size': 3.'width' : 1,    'with_labels' : False,}nx.draw(G, **options)Copy the code

This graph is very sparse, and Networkx shows this thinning by maximizing the spacing of each cluster.

There are many data visualization packages out there, but it’s impossible to say which one is the best. Hopefully, after reading this article, you’ve learned how to use different beautification tools and code in different situations.

Original link:

Towardsdatascience.com/reviewing-p…