This is the 14th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Review and
We have studied the matplotlib module in previous periods, including the commonly used line chart reflecting data changes, the bar chart comparing data type differences and the histogram reflecting data frequency distribution.
A quick look ahead
-
Matplotlib module overview: Matplotlib module common methods summary
-
Matplotlib module basic principle: Matplotlib module script layer, art layer and back-end layer explained
-
Matplotlib: summary of line chart properties and methods
-
Matplotlib histogram drawing: summary of histogram related properties and methods
-
Matplotlib Histogram drawing: summary of histogram related properties and methods
In statistical graphs of data, there is a graph in which the hash points are distributed in coordinates, reflecting the trend of the data with respect to independent variables.
In this installment, we will learn matplotlib’s scatter plot properties in detail, let’s go~
1. Overview of scatter diagram
-
What is a scatter plot?
- Scatter plots are used to plot data points on horizontal and vertical axes, with data distributed as points in a left scale system
- A scatter plot shows the general tendency of a dependent variable to vary with respect to an independent variable
- Scatter diagram is composed of multiple left coordinate points, and the distribution of coordinate points is investigated to determine whether there is a certain correlation or distribution pattern
- Points of different categories are represented by markers of different shapes or colors on the diagram
- Scatter graph is mainly divided into scatter graph matrix, three-dimensional scatter graph and ArcGIS scatter graph
-
Usage scenarios of scatter graphs
- Scatter plots are used to compare aggregated data across categories
- Scatter plots are used to analyze the trend of linear and polynomial data
- Scatter plots are used for four quadrant analysis
- Scatter plots are used to find data trend formulas
- Scatterplot can be used to aid in accurate charts later
-
Step of drawing scatter diagram
- Import the matplotlib.pyplot module
- To prepare data, use numpy/ PANDAS
- Call Pyplot.Scatter () to draw the scatter plot
-
The case shows
In this case, we will analyze the sales distribution of a product with different pricing
-
The case needs to prepare two sets of data x and y axis, the data amount of x and Y axis should be the same
x_value = np.random.randint(50.100.50) y_value = np.random.randint(500.1000.50) Copy the code
-
Draw a scatter
import matplotlib.pyplot as plt import numpy as np plt.rcParams["font.sans-serif"] = ['SimHei'] plt.rcParams["axes.unicode_minus"] =False x_value = np.random.randint(50.100.50) y_value = np.random.randint(500.1000.50) plt.scatter(x_value,y_value) plt.title("data analyze") plt.xlabel("Sale price") plt.ylabel("Sales volume") plt.show() Copy the code
2. Scatterplot properties
-
Set the scatter size
- Key word: S
- The incoming data type is either list or number, which defaults to 20
-
Sets scatter color
- Key word: C
- The default color is blue
- Value range
- Words for colors: e.g. Red, “red”
- Short for color words such as red “r”, yellow “Y”
- RGB format: hexadecimal format, such as “# 88C999 “; (r,g,b) tuple form
- You can also pass in a list of colors
-
Set the scatter style
- Key word: marker
- The system defaults to ‘O’ circles
- Values can also be taken: (‘ o ‘, ‘v’, ‘^’, ‘<‘ and ‘>’, ‘8’, ‘s’, ‘p’, ‘*’, ‘h’, ‘h’, ‘D’, ‘D’, ‘p’, ‘X’)
-
Set transparency
- Key words: alpha
- Value range: 0 to 1
-
Set the scatter border
- Key words: Edgecolor
- The default is face
- Value options:
- “face”|”none”
- An English word, abbreviation, or RGB for color
-
Using the example from the previous section, we set the scatter size, the scatter border is pink, and the scatter color is #88c999
size = (20*np.random.rand(50)) * *2 plt.scatter(x_value,y_value,s=area,c="#88c999",edgecolors="pink") Copy the code
3. Add a broken line scatter diagram
When we look at scatter charts, we sometimes use broken line charts to aid in analysis. Let’s continue to analyze the data from the first section.
-
We use np.random.rand() to generate 100 random data
x_value = 100*np.random.rand(100) y_value = 100*np.random.rand(100) Copy the code
-
We need to use our high school math company, such as sin\cos functions.
-
Use the Pyplot.plot () method to plot the graph
r0 = 80 plt.scatter(x_value,y_value,c="hotpink",edgecolors="blue") the = np.arange(0,np.pi/2.0.01) plt.plot(r0*np.cos(the),r0*np.sin(the)) Copy the code
4. Multi-type scatter diagram
When we look at data, we compare multiple types of data at the same time, so we can distinguish the presentation by color or scatter style
-
Method 1: When using color to distinguish different categories, we need to add new data and Scatter
x_value = 100*np.random.rand(100) y_value = 100*np.random.rand(100) y1_value = 100*np.random.rand(100) plt.scatter(x_value,y_value, c="hotpink",edgecolors="blue",label="A product") plt.scatter(x_value,y1_value, c="#88c999", edgecolors="y",label="B") Copy the code
-
We can use marker to mark different types. For example, we use the example in the last section and add scatter().
r0 = 80 size = (20*np.random.rand(100)) * *2 r = np.sqrt(x_value**2+y_value**2) area = np.ma.masked_where(r > r0,size) area1 = np.ma.masked_where(r <= r0, size) plt.scatter(x_value,y_value,s=area,c="hotpink",edgecolors="blue",label="A product") plt.scatter(x_value, y_value, s=area1, c="red", edgecolors="y",marker="^",label="B") the = np.arange(0,np.pi/2.0.01) plt.plot(r0*np.cos(the),r0*np.sin(the)) Copy the code
5. Color bar scatter diagram
In the scatter chart, in order to represent the color depth of each point, we can use the CMAP color bar to add
- Color bar display keyword: CMAP
- The default is viridis, with optional values such as accent_r,blues_r, BRbg_r,greens_r, and so on
- Represents the value from 0 to 100 for each color
To display a list of colors we need to call Pyplot.colorbar ()
For example, we add a list of colors in red to the scatter diagram
size = (20*np.random.rand(100)) * *2
color = np.random.randint(0.100.100)
plt.scatter(x_value,y_value, s=size, c=color,label="A product",cmap="afmhot_r")
plt.colorbar()
Copy the code
6. Curve scatter diagram
Scatterplot is composed of a coordinate point, when these points have a certain law, we can use the scatterplot to draw the curve.
We use Scatter () to draw a power function of 2
x_value = list(range(1.100))
y_value = [x ** 2 for x in x_value]
plt.scatter(x_value,y_value,c=y_value,cmap="hot_r",edgecolors="none",s=50)
plt.show()
Copy the code
conclusion
In this issue, we studied the scatter method and related properties of Matplotlib. pyplot. Scatterplot can be used to quickly discover the distribution of data that cannot be found regularly for the time being
That’s the content of this episode. Please give us your thumbs up and comments. See you next time