1. Introduction of Seaborn
Seaborn is a library based on matplotlib and its data structure is consistent with Pandas.
The Seaborn library is designed to mine and understand data around data visualization.
Seaborn provides the data set oriented mapping function, which mainly operates on column index and array, including internal semantic mapping and statistical integration of the whole data set.
It is no exaggeration to say that Seaborn can create any diagram you can imagine.
2. Sample data
All of the visual graphs in this article are based on Seaborn’s own restaurant customer consumption data set, TIPS. The first two pieces of the TIPS dataset are as follows:
No | total_bill | tip | sex | smoker | day | time | size |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 2 |
(Total_bill, tip, sex, smoker, day, time, size, smoker)
3. Seaborn overview
The diagram
A diagram is generally used to show a bivariate relationship.
function | role |
---|---|
relplot(kind=’line’)/lineplot( ) | Draw a line graph with parameters: Data, X, Y, Hue |
relplot(kind=’scatter’)/scatterplot( ) | Draw scatter diagram with parameters: data, x, y, Hue |
parameter | meaning |
---|---|
data | Pandas. DataFrame object |
x | The X-axis variable of the plot |
y | The Y-axis variable of the plot |
hue | Discriminating dimensions are generally typed variables |
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='darkgrid')
tips = sns.load_dataset('tips')
sns.relplot(x='total_bill',y='tip',data=tips)
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
fmri = sns.load_dataset("fmri")
sns.relplot(x="timepoint", y="signal", kind="line", data=fmri);
Copy the code
Classification figure
Visualization of categorizable data; The classification diagram can be presented by scatter diagram, distribution diagram, estimation diagram and other forms.
function | role |
---|---|
catplot(kind=’strop’)/stripplot( ) | Classified scatter plot |
catplot(kind=’swarm’)/swarmplot( ) | Classified scatter plot |
catplot(kind=’box’)/boxplot( ) | Classification distribution map |
catplot(kind=’violin’)/violinplot( ) | Classification distribution map |
catplot(kind=’boxen’)/boxenplot( ) | Classification distribution map |
catplot(kind=’point’)/pointplot( ) | Classification estimation diagram |
catplot(kind=’bar’)/barplot( ) | Classification estimation diagram |
catplot(kind=’count’)/countplot( ) | Classification estimation diagram |
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='ticks',color_codes=True)
tips = sns.load_dataset('tips')
sns.catplot(x='day',y='total_bill',data=tips)
sns.catplot(x='day',y='total_bill',kind='swarm',data=tips)
sns.catplot(x='day',y='total_bill',kind='box',data=tips)
diamonds = sns.load_dataset('diamonds')
sns.catplot(x='color',y='price',kind='boxen',data=diamonds.sort_values('color'))
sns.catplot(x="total_bill", y="day", hue="time",kind="violin", data=tips)
titanic = sns.load_dataset("titanic")
sns.catplot(x="sex", y="survived", hue="class", kind="point", data=titanic)
sns.catplot(x="sex", y="survived", hue="class", kind="bar", data=titanic)
sns.catplot(x="deck", kind="count", palette="ch:.25", data=titanic)
Copy the code
Return to figure
Regression is performed on the data and the regression function is plotted.
function | role |
---|---|
lmplot( ) | Plot regression |
regplot( ) | Plot regression |
residplot( ) | Plot regression |
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
sns.lmplot(x="total_bill", y="tip", data=tips)
sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),scatter_kws={"s": 80})
f, ax = plt.subplots(figsize=(5.6))
sns.regplot(x="total_bill", y="tip", data=tips, ax=ax)
Copy the code
Distribution of
A chart used to examine univariate or bivariate distributions.
function | role |
---|---|
distplot( ) | Univariate distribution |
kdeplot( ) | Kernel density estimation |
pairplot( ) | Pairwise binary distribution |
joinplot( )/joinplot(kind=’hex’)/joinplot(kind=’reg’) | Bivariate distribution |
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
sns.set(color_codes=True)
x = np.random.normal(size=100)
sns.distplot(x)
sns.kdeplot(x, shade=True)
mean, cov = [0.1], [[1.. 5), (. 5.1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x"."y"])
sns.jointplot(x="x", y="y", data=df)
iris = sns.load_dataset("iris")
sns.pairplot(iris)
Copy the code
The matrix in figure
Visualize the data set as a matrix.
function | role |
---|---|
heatmap( ) | Heat map |
clustermap( ) | Cluster matrix graph |
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set_theme()
# Load the example flights dataset and convert to long-form
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month"."year"."passengers")
# Draw a heatmap with the numeric values in each cell
f, ax = plt.subplots(figsize=(9.6))
sns.heatmap(flights, annot=True, fmt="d", linewidths=. 5, ax=ax)
sns.set_theme()
# Load the brain networks example dataset
df = sns.load_dataset("brain_networks", header=[0.1.2], index_col=0)
# Select a subset of the networks
used_networks = [1.5.6.7.8.12.13.17]
used_columns = (df.columns.get_level_values("network")
.astype(int)
.isin(used_networks))
df = df.loc[:, used_columns]
# Create a categorical palette to identify the networks
network_pal = sns.husl_palette(8, s=45.)
network_lut = dict(zip(map(str, used_networks), network_pal))
# Convert the palette to vectors that will be drawn on the side of the matrix
networks = df.columns.get_level_values("network")
network_colors = pd.Series(networks, index=df.columns).map(network_lut)
# Draw the full plot
g = sns.clustermap(df.corr(), center=0, cmap="vlag",
row_colors=network_colors, col_colors=network_colors,
dendrogram_ratio=(1..2.),
cbar_pos=(. 02.32...03.2.),
linewidths=75., figsize=(12.13))
g.ax_row_dendrogram.remove()
Copy the code
Structured multidrawing
Plots relationships between pairs of variables in the form of subgraphs.
function | role |
---|---|
FacetGrid | Structured multidrawing |
PairGrid | Structured multidrawing |