Towardsdatascience by George Seif, Compiled by Geek AI and Liu Xiaokun.
Have you used thermal maps, 2d density maps, spider webs and trees?
Data visualization is an important part of data science or machine learning projects. Often, you need to do exploratory data analysis (EDA) early in the project to get some understanding of the data, and creating visualizations can really make the task of analysis clearer and easier to understand, especially with large, high-dimensional data sets. Towards the end of a project, it’s also important to present the end result in a clear, concise and compelling way that your audience (usually non-technical customers) can understand.
You may have read my previous article, “5 Quick and Easy Data Visualizations in Python with Code,” where I introduced you to five basic Data visualization methods: scatter, line, histogram, bar, and box. These are simple and powerful visualization methods that allow you to gain insight into a data set. In this article, we’ll look at four more data visualization methods! These methods are covered in more detail in this article and can be used after you have read the basic methods in the previous article to extract deeper information from the data.
Heat map
A Heat Map is a matrix representation of data in which the values of each matrix element are represented by a color. Different colors represent different values, and the index of the matrix connects two items or features that need to be compared. Thermal maps are great for showing relationships between multiple characteristic variables because you can know the size of the matrix elements at that location directly by color. By looking at other points in the heat map, you can also see how each relationship compares to other relationships in the data set. Color is so intuitive that it gives us a very simple way to interpret data.
Now let’s look at the implementation code. “Seaborn” can be used to draw more advanced graphics than “matplotlib,” which usually requires more components, such as multiple colors, graphics, or variables. “Matplotlib” can be used to display graphics, “NumPy” can be used to generate data, and “pandas” can be used to manipulate data! Drawing is just a simple function of “Seaborn”.
# Importing libs
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create a random datasetData = pd. DataFrame (np) random) random ((1, 6)), the columns = ["Iron Man"."Captain America"."Black Widow"."Thor"."Hulk"."Hawkeye"])
print(data)
# Plot the heatmap
heatmap_plot = sns.heatmap(data, center=0, cmap='gist_ncar')
plt.show()Copy the code
Two-dimensional density map
The 2D Density Plot is an intuitive extension of the one-dimensional version of the Density Plot, which has the advantage of being able to see probability distributions for two variables over the one-dimensional version. For example, in the two-dimensional density plot below, the scale plot on the right is colored with the probability of each point. The place where our data is most likely to appear (i.e. where the data points are most concentrated) seems to be around size=0.5 and speed=1.4. As you know by now, a two-dimensional density map is very useful for quickly figuring out where our data is most concentrated with two variables, as opposed to a one-dimensional density map with one variable. When you have two variables that are important to the output and you want to understand how they work together on the distribution of the output, it’s very useful to look at the data using a two-dimensional density map.
Once again, writing code using “Seaborn” is very convenient! This time, we will create a skewness distribution to make the data visualization results more interesting. You can adjust most of the optional parameters to make the visualization look clearer.
# Importing libs
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
# Create the data
speed = skewnorm.rvs(4, size=50)
size = skewnorm.rvs(4, size=50)
# Create and shor the 2D Density plot
ax = sns.kdeplot(speed, size, cmap="Reds", shade=False, bw=.15, cbar=True)
ax.set(xlabel='speed', ylabel='size')
plt.show()Copy the code
Spider diagram
Spider plots are one of the best ways to show one-to-many relationships. In other words, you can plot and view the values of multiple variables related to a variable or category. In a spider web diagram, the significance of one variable relative to another is clear and obvious because the area covered and the length from the center become larger in a particular direction. If you want to see how several different classes of objects are described using these variables, you can draw them side by side. In the chart below, it’s easy to compare the different stats of the Avengers and see where each one is good at! (Please note that these numbers are set randomly, and I have no bias against the Avengers.)
In this case, we can directly use “matplotlib” instead of “seaborn” to create visual results. We need each property to be equally spaced around the circle. We will set labels on each corner and then plot the value as a point whose distance from the center depends on its value/size. Finally, for clarity, we will use a translucent color to fill in the area surrounded by lines connecting the property points.
# Import libs
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Get the data
df=pd.read_csv("avengers_data.csv")
print(df)
""" # Name Attack Defense Speed Range Health 0 1 Iron Man 83 80 75 70 70 1 2 Captain America 60 62 63 80 80 2 3 Thor 80 82 83 100 100 3 3 Hulk 80 100 67 44 92 4 4 Black Widow 52 43 60 50 65 5 5 Hawkeye 58 64 58 80 65 """
# Get the data for Iron Man
labels=np.array(["Attack"."Defense"."Speed"."Range"."Health"])
stats=df.loc[0,labels].values
# Make some calculations for the plot
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
stats=np.concatenate((stats,[stats[0]]))
angles=np.concatenate((angles,[angles[0]]))
# Plot stuff
fig = plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, stats, 'o-', lineWidth =2) ax.fill(angles, stats, alpha=0.25) ax.set_thetagrids(angles * 180/np.pi, labels) ax.set_title([df.loc[0,"Name"]])
ax.grid(True)
plt.show()Copy the code
tree
We’ve been using Tree diagrams since elementary school! Trees are natural and intuitive, which makes them easy to interpret. Nodes that are directly connected are closely related, while nodes with multiple connections are less similar. In the visualizations below, I plotted a tree of a small set of Pokemon game data sets based on Kaggle statistics (health, attack, Defense, Special attack, Special Defense, speed).
As a result, the pokemon that are the best match statistically will be closely linked together. For example, at the top of the graph, the Abergean and the billbill are directly connected. If we look at the data, the abergean has an overall score of 438 and the billbill 442, which is very close! But if we look at Lada, we can see that it has a total score of 413, which is quite different from the Abergean or the billbill, so they are separated in the tree! As we moved up the tree, the pokemon in the green group were more similar to each other than they were to any pokemon in the red group, even though there was no direct green link.
For tree graphs, we actually need to use “Scipy” to draw! After reading the data in the dataset, we will delete the string column. This is done just to make the visualizations more intuitive and understandable, but in practice, converting these strings into categorical variables yields better results and comparisons. We also set the index of the data frame so that it can be used appropriately as a column referencing each node. As a final note, computing and plotting a tree in Scipy requires a simple line of code.
# Import libs
import pandas as pd
from matplotlib import pyplot as plt
from scipy.cluster import hierarchy
import numpy as np
# Read in the dataset
# Drop any fields that are strings
# Only get the first 40 because this dataset is big
df = pd.read_csv('Pokemon.csv')
df = df.set_index('Name')
del df.index.name
df = df.drop(["Type 1"."Type 2"."Legendary"], axis=1)
df = df.head(n=40)
# Calculate the distance between each sample
Z = hierarchy.linkage(df, 'ward')
# Orientation our tree
hierarchy.dendrogram(Z, orientation="left", labels=df.index)
plt.show()Copy the code
The original link: https://towardsdatascience.com/4-more-quick-and-easy-data-visualizations-in-python-with-code-da9030ab3429