This article details the iris (iris) data sets, matplotlib. Pyplot. Scatter and matplotlib. Axes. Axes. Scatter map scatter scatter of the two methods.
What will you learn?
1. Iris data setData set import and view features DESCR
data
feature_names
target target_names Convert the iris dataset to a DataFrame dataset 2, matplotlib. Pyplot. Scatter method to draw a scatter diagram (parameters,)3, matplotlib. Axes. Axes. Scatter method to draw a scatter diagram (parameters,)Copy the code
Use Python matlibplot to draw scatter plots. Use python matlibplot to draw scatter plots.
1. Detailed introduction of iris data set
-
Data set import and view features
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from sklearn import datasets
iris=datasets.load_iris() dir(iris) Copy the code
[‘DESCR’, ‘data’, ‘feature_names’, ‘target’, ‘target_names’]
DESCR
#DESCR is the description of the data set.
print(iris.DESCR)
Copy the code
Iris Plants Database
= = = = = = = = = = = = = = = = = = = =
Notes
-----
Data Set Characteristics: :Number of Instances: 150 (50 in each of three classes) :Number of Attributes: 4 numeric, predictive attributes and the class :Attribute Information:# Four features of four-column data - sepal length in cm - sepal width in cm - petal length in cm - petal width in cm - class:Data description three types of irises - Iris-Setosa - Iris-Versicolour - Iris-Virginica :Summary Statistics:Simple statistics for four columns of data = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Min Max Mean SD Class Correlation = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =Sepal Length: 4.3 7.9 5.84 0.83 0.7826Sepal width: 2.0 4.4 3.05 0.43-0.4194Petal Length: 1.0 6.9 3.76 1.76 0.9490 (high!)Petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = :Missing Attribute Values: None : Class Distribution: 33.3%for each of 3 classes. :Creator: R.A. Fisher :Donor: Michael Marshall (MARSHALL%[email protected]) :Date: July, 1988 This is a copy of UCI ML iris datasets. http://archive.ics.uci.edu/ml/datasets/Iris The famous Iris database, first used by Sir R.A Fisher This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. References ---------- - Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-2, No. 1, 67-71. - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions on Information Theory, May 1972, 431-433. - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II conceptual clustering system finds 3 classes in the data. - Many, many more ... Copy the code
data
The data of four characteristics of iris.
print(type(iris.data))
print(iris.data.shape)
iris.data[:10To:]Copy the code
# array([[5.1, 3.5, 1.4, 0.2],# first ten rows [4.9, 3., 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5., 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4], [4.6, 3.4, 1.4, 0.3]. [5., 3.4, 1.5, 0.2], [4.4, 2.9, 1.4, 0.2], [4.9, 3.1, 1.5, 0.1]])
feature_names
The names of the above 4 columns are, from left to right, calyx length, calyx width, petal length and petal width, all in cm.
print(iris.feature_names)
Copy the code
[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]
target
Use the numbers 0.,1.,2. To identify what kind of iris each row of data represents.
print(iris.target)A list of 150 elements
Copy the code
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
target_names
The names of irises, Setosa, Versicolour, Virginica.
print(iris.target_names)
Copy the code
[‘setosa’ ‘versicolor’ ‘virginica’]
Convert the iris dataset to a DataFrame dataset
x, y = iris.data, iris.target
pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150.1))),columns=['sepal length(cm)'.'sepal width(cm)'.'petal length(cm)'.'petal width(cm)'.'class'])#np.hstack() is similar to paste in Linux
#np.vstack() is similar to cat in Linux
pd_iris.head() Copy the code
2, matplotlib. Pyplot. Scatter method to draw a scatter diagram (parameters,)
- Take the first two columns of the data set to draw a simple scatter plot
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
# Data preparation
from sklearn import datasets iris=datasets.load_iris() x, y = iris.data, iris.target pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150.1))),columns=['sepal length(cm)'.'sepal width(cm)'.'petal length(cm)'.'petal width(cm)'.'class'])plt.figure(dpi=100) plt.scatter(pd_iris['sepal length(cm)'],pd_iris['sepal width(cm)']) # According to the sepal Length (cm) and Sepal Width (cm) columns, two numerically determined points in each row are drawn on the figure as scatter points Copy the code
- The data of three different irises were represented by different markers and colors
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
# Data preparation
from sklearn import datasets iris=datasets.load_iris() x, y = iris.data, iris.target pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150.1))),columns=['sepal length(cm)'.'sepal width(cm)'.'petal length(cm)'.'petal width(cm)'.'class']) plt.figure(dpi=150)# Set the resolution of the image plt.style.use('Solarize_Light2')# Draw using Solarize_Light2 style iris_type=pd_iris['class'].unique()Classify points into three categories according to the class column iris_name=iris.target_namesGet the name of each class colors = ['#c72e29'.'# 098154'.'#fb832d']# 3 different colors markers = ['$\clubsuit,'.', '+']# three different shapes for i in range(len(iris_type)): plt.scatter(pd_iris.loc[pd_iris['class'] = =iris_type[i], 'sepal length(cm)'],# incoming datax pd_iris.loc[pd_iris['class'] = =iris_type[i], 'sepal width(cm)'],# incoming datay s= 50,# scatter graph (markerThe size of the) c = colors[i], #markercolor marker = markers[i], #markerThe shape of #marker=matplotlib.markers.MarkerStyle(marker = markers[i],fillstyle='full')# setmarkerThe population of the alpha= 0.8, #markerTransparency, range 0-1 facecolors= 'r', #markerFill color when abovecParameter is set to color, priorityc edgecolors= 'none', #markerEdge line color linewidths= 1, and #markerEdge line width,edgecolorsIf this parameter is not specified, it takes no effect label = iris_name[i] # the name of the legend below is taken fromlabel plt.legend(loc = 'upper right') Copy the code
3, matplotlib. Axes. Axes. Scatter method to draw a scatter diagram (parameters,)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
# Data preparation
from sklearn import datasets iris=datasets.load_iris() x, y = iris.data, iris.target pd_iris = pd.DataFrame(np.hstack((x, y.reshape(150.1))),columns=['sepal length(cm)'.'sepal width(cm)'.'petal length(cm)'.'petal width(cm)'.'class']) fig,ax = plt.subplots(dpi=150) iris_type=pd_iris['class'].unique()Classify points into three categories according to the class column iris_name=iris.target_namesGet the name of each class colors = ['#c72e29'.'# 098154'.'#fb832d']# 3 different colors markers = ['$\clubsuit,'.', '+']# three different shapes for i in range(len(iris_type)): plt.scatter(pd_iris.loc[pd_iris['class'] = =iris_type[i], 'sepal length(cm)'],# incoming datax pd_iris.loc[pd_iris['class'] = =iris_type[i], 'sepal width(cm)'],# incoming datay s= 50,# scatter graph (markerThe size of the) c = colors[i], #markercolor marker = markers[i], #markerThe shape of #marker=matplotlib.markers.MarkerStyle(marker = markers[i],fillstyle='full')# setmarkerThe population of the alpha= 0.8, #markerTransparency, range 0-1 facecolors= 'r', #markerFill color when abovecParameter is set to color, priorityc edgecolors= 'none', #markerEdge line color linewidths= 1, and #markerEdge line width,edgecolorsIf this parameter is not specified, changing parameters does not take effect label = iris_name[i] # the name of the legend below is taken fromlabel plt.legend(loc = 'upper right') Copy the code
4. Reference materials
Scikit-learn.org/stable/data… Matplotlib.org/api/_as_gen… Matplotlib.org/api/_as_gen…
Use Python matlibplot to draw scatter plots. Use python matlibplot to draw scatter plots.
Same series of good articles
Python visualization | matplotlib07 – Matplotlib Colormap (3) the color of Python visualization | 08 – Palettable library Colormap (4) Python | | R visualization 09 – extraction image color graphics (5 – the final color used)
Welcome to our official account:Pythonic biological people