This article is participating in Python Theme Month. See the link to the event for more details
Introduction to the
Matplotlib is an important and convenient graphical tool for manipulating data. This article will explain how to use matplotlib in Python in detail.
Based on drawing
To use matplotlib, we need to reference it:
In [1]: import matplotlib.pyplot as plt
Copy the code
If we were to randomly generate data for 365 days starting from January 1, 2020, and then plot it like this:
ts = pd.Series(np.random.randn(365), index=pd.date_range("1/1/2020", periods=365))
ts.plot()
Copy the code
Use DF to draw multiple Series of images at the same time:
df3 = pd.DataFrame(np.random.randn(365, 4), index=ts.index, columns=list("ABCD"))
df3= df3.cumsum()
df3.plot()
Copy the code
You can specify the data to be used by rows and columns:
df3 = pd.DataFrame(np.random.randn(365, 2), columns=["B", "C"]).cumsum()
df3["A"] = pd.Series(list(range(len(df))))
df3.plot(x="A", y="B");
Copy the code
Other images
Plot () supports many image types, including bar, HIST, Box, Density, Area, Scatter, Hexbin, PIE, etc. Here are some examples of how to use them.
bar
df.iloc[5].plot(kind="bar");
Copy the code
Bar for multiple columns:
df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
df2.plot.bar();
Copy the code
stacked bar
df2.plot.bar(stacked=True);
Copy the code
barh
Barh represents the horizontal bar diagram:
df2.plot.barh(stacked=True);
Copy the code
Histograms
Df2. Plot. Hist (alpha = 0.5).Copy the code
box
df.plot.box();
Copy the code
Box can customize the color:
color = { .... : "boxes": "DarkGreen", .... : "whiskers": "DarkOrange", .... : "medians": "DarkBlue", .... : "caps": "Gray", .... : } df.plot.box(color=color, sym="r+");Copy the code
Can be converted to horizontal:
df.plot.box(vert=False);
Copy the code
In addition to box, you can also plot the box using dataframe.boxplot:
In [42]: df = pd.DataFrame(np.random.rand(10, 5))
In [44]: bp = df.boxplot()
Copy the code
Boxplot can be grouped using by:
df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])
df
Out[90]:
Col1 Col2
0 0.047633 0.150047
1 0.296385 0.212826
2 0.562141 0.136243
3 0.997786 0.224560
4 0.585457 0.178914
5 0.551201 0.867102
6 0.740142 0.003872
7 0.959130 0.581506
8 0.114489 0.534242
9 0.042882 0.314845
df.boxplot()
Copy the code
Now add a column to df:
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
df
Out[92]:
Col1 Col2 X
0 0.047633 0.150047 A
1 0.296385 0.212826 A
2 0.562141 0.136243 A
3 0.997786 0.224560 A
4 0.585457 0.178914 A
5 0.551201 0.867102 B
6 0.740142 0.003872 B
7 0.959130 0.581506 B
8 0.114489 0.534242 B
9 0.042882 0.314845 B
bp = df.boxplot(by="X")
Copy the code
Area
You can plot area plots using either Series. Plot.area () or DataFrame.
In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
In [61]: df.plot.area();
Copy the code
If you don’t want to stack, you can specify Stacked =False
In [62]: df.plot.area(stacked=False);
Copy the code
Scatter
Dataframe.plot.scatter () creates dot plots.
In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])
In [64]: df.plot.scatter(x="a", y="b");
Copy the code
The scatter map can also have a third axis:
df.plot.scatter(x="a", y="b", c="c", s=50);
Copy the code
We can change the third argument to the size of the scatter:
df.plot.scatter(x="a", y="b", s=df["c"] * 200);
Copy the code
Hexagonal bin
Use dataframe.plot. hexbin() to create a cellular diagram:
In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
In [70]: df["b"] = df["b"] + np.arange(1000)
In [71]: df.plot.hexbin(x="a", y="b", gridsize=25);
Copy the code
By default, the color depth represents the number of elements in (x, y). You can specify different aggregation methods by using the reduce_C_function: mean, Max, sum, STD.
In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
In [73]: df["b"] = df["b"] = df["b"] + np.arange(1000)
In [74]: df["z"] = np.random.uniform(0, 3, 1000)
In [75]: df.plot.hexbin(x="a", y="b", C="z", reduce_C_function=np.max, gridsize=25);
Copy the code
Pie
Use DataFrame.plot.pie() or Series.plot.pie() to build the pie chart:
In [76]: series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")
In [77]: series.plot.pie(figsize=(6, 6));
Copy the code
You can draw the graph by the number of columns:
In [78]: df = pd.DataFrame(
....: 3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
....: )
....:
In [79]: df.plot.pie(subplots=True, figsize=(8, 4));
Copy the code
More customized content:
In [80]: series.plot.pie( .... : labels=["AA", "BB", "CC", "DD"], .... : colors=["r", "g", "b", "c"], .... : autopct="%.2f", .... : fontsize=20, .... : figsize=(6, 6), .... :);Copy the code
If the values passed in do not add up to 1, an umbrella is drawn:
In [81] : series = pd series ([0.1] * 4, index = [" a ", "b", "c", "d"], name = "series2") In [82] : series.plot.pie(figsize=(6, 6));Copy the code
Processing NaN data in a drawing
Here is how NaN data is handled in the default drawing mode:
Drawing way | The way NaN is handled |
---|---|
Line | Leave gaps at NaNs |
Line (stacked) | The Fill 0 ‘s |
Bar | The Fill 0 ‘s |
Scatter | Drop NaNs |
Histogram | Drop NaNs (column-wise) |
Box | Drop NaNs (column-wise) |
Area | The Fill 0 ‘s |
KDE | Drop NaNs (column-wise) |
Hexbin | Drop NaNs |
Pie | The Fill 0 ‘s |
Other drawing tools
Scatter matrix
The scatter_matrix diagram may be drawn using scatter_matrix in pandas. Plotting:
In [83]: from pandas.plotting import scatter_matrix In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=["a", "b", "c", "d"]) In [85]: Scatter_matrix (df, alpha=0.2, Figsize =(6, 6), diagonal="kde");Copy the code
Density plot
Using Series.plot.kde() and DataFrame.plot.kde(), you can plot the density:
In [86]: ser = pd.Series(np.random.randn(1000))
In [87]: ser.plot.kde();
Copy the code
Andrews curves
Andrews curves allow multivariate data to be plotted as a large number of curves created using the properties of the sample as the coefficients of the Fourier series. By coloring these curves differently for each class, data clustering can be visualized. The curves of samples that belong to the same category tend to be closer together and form a larger structure.
In [88]: from pandas.plotting import andrews_curves
In [89]: data = pd.read_csv("data/iris.data")
In [90]: plt.figure();
In [91]: andrews_curves(data, "Name");
Copy the code
Coordinates
Parallel coordinates are a rendering technique used to plot multivariate data. Parallel coordinates allow people to see clusters in the data and visually estimate other statistics. Parallel coordinate points are represented as connected line segments. Each vertical line represents a property. A set of connected line segments represents a data point. Points that tend to converge will appear closer together.
In [92]: from pandas.plotting import parallel_coordinates
In [93]: data = pd.read_csv("data/iris.data")
In [94]: plt.figure();
In [95]: parallel_coordinates(data, "Name");
Copy the code
Lag plot
A hysteresis graph is a scatter diagram of time series and the corresponding sequence of hysteresis order. Can be used to observe autocorrelation.
In [96]: from pandas.plotting import lag_plot In [97]: plt.figure(); In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000) In [99]: Data = pd.series (0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing)) In [100]: lag_plot(data);Copy the code
Autocorrelation plot
Autocorregrams are commonly used to examine randomness in time series. An autocorregram is a two-dimensional coordinate overhang diagram of a plane. The abscissa represents the delay order, and the ordinate represents the autocorrelation coefficient.
In [101]: from pandas.plotting import autocorrelation_plot In [102]: plt.figure(); In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000) In [104]: Data = pd.series (0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing)) In [105]: autocorrelation_plot(data);Copy the code
Bootstrap plot
Bootstrap plot is used to visually evaluate the uncertainty of statistical data, such as mean, median, middle range, etc. Select a random subset of the specified size from the data set, calculate the relevant statistics for that subset, and repeat the specified number of times. The resulting graph and histogram form the guide graph.
In [106]: from pandas.plotting import bootstrap_plot
In [107]: data = pd.Series(np.random.rand(1000))
In [108]: bootstrap_plot(data, size=50, samples=500, color="grey");
Copy the code
RadViz
It’s based on the spring tension minimization algorithm. It maps the feature of the data set to a point in the unit circle of the two-dimensional target space, and the position of the point is determined by the feature attached to the point. Drop the instance into the center of the circle, and the feature “pulls” the instance toward its position in the circle (the normalized value of the instance).
In [109]: from pandas.plotting import radviz
In [110]: data = pd.read_csv("data/iris.data")
In [111]: plt.figure();
In [112]: radviz(data, "Name");
Copy the code
Image format
After matplotlib 1.5, there are many default drawing Settings that can be set with matplotlib.style.use(my_plot_style).
By using matplotlib. Style. The available to list all of the available type style:
import matplotlib as plt;
plt.style.available
Out[128]:
['seaborn-dark',
'seaborn-darkgrid',
'seaborn-ticks',
'fivethirtyeight',
'seaborn-whitegrid',
'classic',
'_classic_test',
'fast',
'seaborn-talk',
'seaborn-dark-palette',
'seaborn-bright',
'seaborn-pastel',
'grayscale',
'seaborn-notebook',
'ggplot',
'seaborn-colorblind',
'seaborn-muted',
'seaborn',
'Solarize_Light2',
'seaborn-paper',
'bmh',
'seaborn-white',
'dark_background',
'seaborn-poster',
'seaborn-deep']
Copy the code
Get rid of small ICONS
By default, the drawn graph will have an icon representing the column type, which can be disabled using Legend =False:
In [115]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
In [116]: df = df.cumsum()
In [117]: df.plot(legend=False);
Copy the code
Set the name of the label
In [118]: df.plot();
In [119]: df.plot(xlabel="new x", ylabel="new y");
Copy the code
The zoom
If there is too much difference between X axis and Y axis data in the drawing, the image display may be unfriendly, and the part with small values can hardly be displayed. You can pass logy=True to scale Y axis:
In [120]: ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
In [121]: ts = np.exp(ts.cumsum())
In [122]: ts.plot(logy=True);
Copy the code
Multiple Y
Secondary_y =True is used to draw multiple Y-axis data:
In [125]: plt.figure();
In [126]: ax = df.plot(secondary_y=["A", "B"])
In [127]: ax.set_ylabel("CD scale");
In [128]: ax.right_ax.set_ylabel("AB scale");
Copy the code
By default, the icon is added with the word “right”. To remove it, set mark_right=False:
In [129]: plt.figure();
In [130]: df.plot(secondary_y=["A", "B"], mark_right=False);
Copy the code
Coordinate text adjustment
X_compat =True (x) =True (x) =True
In [133]: plt.figure();
In [134]: df["A"].plot(x_compat=True);
Copy the code
If more than one image needs to be adjusted, you can use with:
In [135]: plt.figure(); In [136]: with pd.plotting.plot_params.use("x_compat", True): ..... : df["A"].plot(color="r") ..... : df["B"].plot(color="g") ..... : df["C"].plot(color="b") ..... :Copy the code
subgraph
When drawing DF, you can separate multiple Series as subgraphs:
In [137]: df.plot(subplots=True, figsize=(6, 6));
Copy the code
You can modify the layout of the subgraph:
df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False);
Copy the code
This is equivalent to:
In [139]: df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False);
Copy the code
A more complex example:
In [140]: fig, axes = plt.subplots(4, 4, figsize=(9, 9)) In [141]: PLT. Subplots_adjust (wspace = 0.5, img tags like hspace = 0.5) In [142] : target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]] In [143]: target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]] In [144]: df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False); In [145]: (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False);Copy the code
Painting form
If set table=True, you can directly display the table data in the graph:
Ax = plt.subplots(1, 1, figsize=(7, 6.5)) In [166]: df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"]) In [167]: ax.xaxis.tick_top() # Display x-axis ticks on top. In [168]: df.plot(table=True, ax=ax) figCopy the code
Table can also be displayed on the image:
In [172]: from pandas.plotting import table In [173]: fig, ax = plt.subplots(1, 1) In [174]: Table (ax, np.round(df.describe(), 2), loc=" describe ", colWidths=[0.2, 0.2, 0.2]); table(ax, np.round(df.describe(), 2), loc=" describe ", colWidths=[0.2, 0.2, 0.2]); In [175]: df.plot(ax=ax, ylim=(0, 2), legend=None);Copy the code
Using Colormaps
If there is too much data on the Y-axis, it may be difficult to distinguish the lines using the default color. In this case, you can pass colorMap.
In [176]: df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index)
In [177]: df = df.cumsum()
In [178]: plt.figure();
In [179]: df.plot(colormap="cubehelix");
Copy the code
This article is available at www.flydean.com/09-python-p…
The most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the small skills waiting for you to find!