This article was originally written by Bao Kazuo @Kosai data analyst.


Seaborn is a great visualization library, especially when the data dimensions are large. It allows us to draw descriptive statistical graphs with minimal code to find features between dimensional variables.

Following on from the previous Python visualization: Seaborn (I), have used Seaborn for Distribution Visualization, and we will share Categorial Visualization using Seaborn, including Stripplot & Swarmplot involved. Boxplot & Violinplot, Barplot & Pointplot, and abstract Factorplot.


Here we combine Iris Iris data set publicly available on Corsai for demonstration.

All the complete source code is available

K – Lab onlineData analysis collaboration tools

Repetition. it
It covers mainstream languages such as Python and R, and has completed the deployment of more than 90% data analysis and mining libraries, including Seaborn, Pandas, Numpy, etc., to help data professionals focus on data analysis and improve their efficiency.

Iris Iris data set: is a commonly used classification experimental data set, collected and sorted by Fisher, 1936. Is a kind of data set for multivariate analysis. A total of 150 data sets are included, which are divided into 3 categories with 50 data in each category and 4 attributes in each category. Four attributes of calyx length (Sepal_length), calyx width (Sepal_width), petal length (petal_length) and petal width (petal_width) can be used to predict iris flowers belonging to (Setosa, Versicolour, Virginica) of the three species.


Import libraries

import warnings warnings.filter

warnings(“ignore”) 

import pandas as pd 

import numpy as np 

import matplotlib.pyplot as plt %matplotlib inline 

import seaborn as sns


Stripplot

The essence of Stripplot is to make Scatterplot according to categories of variables with Quantitative attributes in a data set.

We visualized the Stripplot of Sepal Length of different types of flowers in the kite dataset.

PLT. Figure (1, figsize = (12, 6))

PLT. Subplot (1, 2, 1)

sns.stripplot(x=’species’,y=’sepal_length’,data=iris) #stripplot 

plt.title(‘Striplot of sepal length of Iris species’)with sns.axes_style(“whitegrid”): # This is a temporary style setting command, if not written, the default format ‘darkGrid’ will be drawn

PLT. Subplot (1,2,2)

 plt.title(‘Striplot of sepal length of Iris species’) sns.stripplot(x=’species’,y=’sepal_length’,data=iris,jitter=True) # jitterplot 

 plt.show()



The top image on the left is a scatter plot drawn with Stripplot in the default style. In many cases, points in Stripplot overlap, making it difficult to see where the points are distributed. A simple solution is to plot the Jitterplot based on the Stripplot, showing the distribution by randomly fine-tuning the positions of the hour points along the category axes.


Swarmplot

Another way to solve the problem of overlapping points in Stripplot is to draw Swarmplot, which is essentially to draw these overlapping points by “stretching” them along the axis of the category through an algorithm. We visualized Swarmplot of Petal Length and Petal width of different flower species in the flower data set.

PLT. Figure (1, figsize = (12, 6))

PLT. Subplot (1, 2, 1)

sns.swarmplot(x=’species’,y=’petal_length’,data=iris) 

With sns.axes_style(“ticks”)

PLT. Subplot (1,2,2)

 sns.swarmplot(x=’species’,y=’petal_width’,data=iris) 

 plt.show()




Boxplot

A box plot, consisting mainly of six data nodes, arranges a set of data from largest to smallest and calculates the upper edge, upper quartile Q3, median, lower quartile Q1, lower edge, and outliers respectively. Below, the four variables sepal_length, sepal_width, PEtal_LENGTH and petal_width in the kite dataset are visualized in the box diagram.

var = [‘sepal_length’,’sepal_width’,’petal_length’,’petal_width’] 

axes_style = [‘ticks’,’white’,’whitegrid’, ‘dark’] 

Figure = plt.figure(1,figsize=(12,12))for I in range(4): with sns.axes_style(axes_style[I]): #

PLT. Subplot (2, 2, I + 1) SNS. Boxplot (x = ‘species’, y = var [I], data = iris)

 plt.show()




Violinplot

Violinplot is equivalent to combining box plot and kernel density plot to better show the quantitative form of data.

context= [‘notebook’,’paper’,’talk’,’poster’] 

axes_style = [‘ticks’,’white’,’whitegrid’, ‘dark’] 

Plt. figure(1,figsize=(12,12))for I in range(4): with SNS. Axes_style (axes_style[I]):#

Sns.set_context (context[I]) # set the default context style to notebook

PLT. Subplot (2, 2, I + 1)

 plt.title(str(var[i])+ ‘ in Iris species’) 

 sns.violinplot(x=’species’,y=var[i],data=iris) 

 plt.show()



Violinplot used Kernel Density Estimate to better describe the distribution of quantitative variables.

At the same time, Swarmplot and Boxplot or Violinplot can also be combined to describe Quantitative variables. The iris data set is shown as follows:

context= [‘notebook’,’paper’,’talk’,’poster’] 

axes_style = [‘ticks’,’white’,’whitegrid’, ‘dark’] 

Plt. figure(1,figsize=(12,12))for I in range(4): with

Sns.axes_style (axes_style[I]):# set axes_style sns.set_context(context[I]

PLT. Subplot (2, 2, I + 1)

 plt.title(str(var[i])+ ‘ in Iris species’) 

 sns.swarmplot(x=’species’, y=var[i], data=iris, color=”w”, alpha=.5) 

sns.violinplot(x=’species’, y=var[i], data=iris, inner=None) if i%2 ==0 \ else sns.boxplot(x=’species’, y=var[i], Data =iris) # swarmPlot + Violinplot and swarmPlot + boxplot

 plt.show()



Barplot

Barplot is mainly the average value of Quantitative variables in classification, and Boostrapping algorithm is used to calculate the confidence interval and Error bar of the estimated value. Using iris data sets.

Plt. figure(1,figsize=(12,12))for I in range(4): with SNS. Axes_style (axes_style[I]):#

Subplot (2,2, I +1) ssn.set_context (context[I]) # set context style (default: notebook)

 plt.title(str(var[i])+ ‘ in Iris species’) sns.barplot(x=’species’,y=var[i],data=iris) 

plt.show()



Countplot

If you want to know how many observations there are under each category, you can use Countplot, which is equivalent to an Observation count, as shown in the iris data set below:

Plt.figure (figsize=(5,5)) sns.countplot(y=”species”, data=iris) # set y=’species’ and place countplot horizontal

plt.title(‘Iris species count’) 

plt.show()



Pointplot

Pointplot is a horizontal extension of Barplot. On the one hand, Barplot is presented with a Point Estimate and Confidence Level. Pointplot, on the other hand, makes it easy to see how different sub-categories relate to each major Category when there are sub-categories that are more subdivided under each major Category. The display is as follows:

Plt. figure(1,figsize=(12,12))for I in range(4): with SNS. Axes_style (axes_style[I]):#

Subplot (2,2, I +1) ssn.set_context (context[I]) # set context style (default: notebook)

 plt.title(str(var[i])+ ‘ in Iris species’) sns.pointplot(x=’species’,y=var[i],data=iris) 

plt.show()



Factorplot

Factorplot can be said to be the essence of Seaborn to do Category Visualization. All the plots mentioned above can be said to be the concrete demonstration of Factorplot. We can use PariGrid to visualize the numerical features of multiple categories using the same Plot.

sns.set(style=”ticks”) g = sns.PairGrid(iris, x_vars = [‘sepal_length’,’sepal_width’,’petal_length’,’petal_width’], Y_vars =’ species’, aspect=0.75,size=4) # Set spacing and image size g.map. (SNS. Violinplot, Palette =’pastel’)

plt.show()



In this data set, Quantitative variables mainly include Area of housing, unit Price per square meter, and total housing Price Tprice.


Kesci.com is an online community for data talents and industry problems. The k-Lab online data analysis and collaboration platform focuses on creating a brand new experience for data workers’ study and work.