1. Brief introduction
The “three Masters of machine learning” are the Python libraries numpy, Matplotlib, and Pandas. In the natural sciences, Matplotlib is “physics”, Pandas is “chemistry”, and Numpy is “mathematics”, which is the “foundation” on which other disciplines are built.
Numpy is the cornerstone because it provides the underlying data structures and computing support for the likes of Matplotlib and PANDAS. The core numPY data structure is the NDARray (N-dimensional array).
2. Preparation
This section is a bit of an abrupt transition, but in order for the code to run smoothly in the next step, this “preparation” has to be forced.
2.1 installation
2.1.1 Checking the installation
conda list | grep numpy
Copy the code
Or:
pip freeze | grep numpy
Copy the code
2.1.2 installation
If yes, skip this step.
conda install numpy
Copy the code
Or:
pip install numpy
Copy the code
2.1.3 update
It is installed and can be updated.
conda update numpy
Copy the code
Or:
pip install --upgrade numpy
Copy the code
2.2 the import
Np stands for industry practice.
import numpy as np
Copy the code
3. Multi-dimensional arrays (numpy. Ndarray: N-dimensional array)
If you are familiar with MATLAB (Matrix Lab), we know that matlab scientific calculation is based on “matrix”. However, numpy’s multi-dimensional array has the same trick.
3.1 create
3.1.1 Create using Np.array ()
Let’s create a numpy multidimensional array (numpy.ndarray) from a two-dimensional list. In NUMpy, the concept of dimension is also called rank (Axes), so the two-dimensional array created here can also be called a multidimensional array with rank 2, which contains 2 Axes. The shape property of the array is a tuple corresponding to the length of each Axis of the multidimensional array. The size property is the number of elements in a multidimensional array, which is equal to the product of all elements of shape.
It’s complicated to say, but it’s actually quite simple, and by looking at the following printout, we can intuitively understand the relationship between the properties.
na = np.array([[1.2.3], [4.5.6]])
print(
"" "object types: {} \ n \ t shape: {} \ t \ n dimensions (rank) : {} \ t \ n number of elements: {} \ t \ n element type: {} \ t \ n "" "
.format(type(na), na.shape, na.ndim, na.size, na.dtype))
na
Copy the code
Object type: <class 'numpy.ndarray'> Shape: (2, 3) Dimension (rank): 2 Number of elements: 6 Element type: int64 Array ([[1, 2, 3], [4, 5, 6]])Copy the code
3.1.2 Created using np.zeros()
Using np.array() is tedious if you don’t already know the values of each element, but np.zeros() is a better choice if you just initialize a multidimensional array with zeros. Using Np.zeros (), you can create a multidimensional array specifying shape and fill all elements of the array with zeros, simply by providing the shape argument, which is also the first positional argument.
na = np.zeros((2.3))
print("dtype: ", na.dtype)
na
Copy the code
dtype: float64
array([[0., 0., 0.],
[0., 0., 0.]])
Copy the code
As you can see from the dType property printed above, the default element’s data type is FLOAT64. Of course, if you don’t want to use the default type, you can use the dtype parameter to do so.
na = np.zeros((2.3), dtype="uint8")
print("dtype: ", na.dtype)
na
Copy the code
dtype: uint8
array([[0, 0, 0],
[0, 0, 0]], dtype=uint8)
Copy the code
3.1.3 Create with np.ones()
Np.ones () and np.zeros are the same except that instead of filling 0, they fill 1.
np.ones((2.3), dtype="float")
Copy the code
array([[1., 1., 1.],
[1., 1., 1.]])
Copy the code
3.1.4 np. Arange ()
Numpy version of range(). Pass the start, end, and step arguments to create a 1-dimensional NUMpy array.
np.arange(2.10.2)
Copy the code
array([2, 4, 6, 8])
Copy the code
0 Although the np.arange() method can only create 1-dimensional arrays, the 0 0 method with numpy arrays is 0 0 changing shape without changing size 0 Note that the 0 () method is 0’s method for numpy array instances, so it’s 0 for any scenario where you want to “reshape” your Shape, including the Np.Linsapce () usage scenario described below.
np.arange(2.13.2).reshape(2.3)
Copy the code
array([[ 2, 4, 6],
[ 8, 10, 12]])
Copy the code
0 0 In addition to 0 0’s method that changes shape, there is a new method resize() that has the same function. The difference is that 0 0 () isn’t changing the original array, resize is
a = np.arange(2.13.2)
b = a.reshape(2.3)
print("after call a.reshape():")
print("a.shape", a.shape)
print("b.shape", b.shape)
a.resize(2.3)
print("after call a.resize():")
print("a.shape", a.shape)
Copy the code
after call a.reshape():
a.shape (6,)
b.shape (2, 3)
after call a.resize():
a.shape (2, 3)
Copy the code
3.1.5 np. Linspace ()
Much like np.arange(), it creates a 1-dimensional numpy array from start to end. But there are two differences:
- The third parameter of linsapce is not the step size, but the number of points in the interval
- The result contains end, while np.arange() does not
np.linspace(2.10.5)
Copy the code
array([ 2., 4., 6., 8., 10.])
Copy the code
3.1.6 np. Random. The random ()
Specify shape to generate random number padding.
np.random.random((2.3))
Copy the code
Array ([[0.22031976, 0.91591833, 0.63773627], [0.92104449, 0.69246379, 0.82988843]])Copy the code
3.1.7 np. Random. Normal (mu, sigma, len)
Create a one-dimensional array with mean mu, standard deviation sigma, and length len of the standard normal distribution.
import matplotlib.pyplot as plt
%matplotlib inline
mu = 2
sigma = 0.5
v = np.random.normal(mu, sigma, 10000)
plt.hist(v, bins=50, density=1)
plt.show()
Copy the code
3.2 read
3.2.1 Tuple index
The tuple length is equal to the rank of Axes, that is, each Axis of a multidimensional array has an index, and tuple brackets can be omitted.
na = np.random.random((2.3))
print(na[(1.2)])
print(na[1.2])
Copy the code
0.7547734386512726
0.7547734386512726
Copy the code
3.2.2 Common multidimensional list mode
na[1] [2]
Copy the code
0.7547734386512726
Copy the code
3.3 Statistical Operation
3.3.1 Maximum value
v = np.random.normal(10.1.10000)
v.max()
Copy the code
13.949035793082137
Copy the code
3.3.2 Finding the minimum
v.min()
Copy the code
6.31475048427698
Copy the code
3.3.3 sum
v.sum()
Copy the code
100051.11447780298
Copy the code
We average
v.mean()
Copy the code
10.005111447780298
Copy the code
3.3.5 Standard deviation
v.std()
Copy the code
0.996315432751019
Copy the code
3.3.6 Finding the median
np.median(v)
Copy the code
10.005551763169866
Copy the code
3.4 Why use Numpy Multidimensional Arrays
At this point, you may be wondering if numpy multidimensional arrays are just that, similar to lists. Yes, numpy multidimensional arrays and lists have a lot in common in terms of structure and usage. In big data analysis, machine learning, especially deep learning, where large amounts of data need to be calculated, its performance will be much better than ordinary lists.
Let’s calculate the mean of an array of 300,000,000 (300 million) in length, the distribution using a list and numpy array. The former took 15 seconds, while the latter took less than two milliseconds. The reason why there is such a big gap is that Numpy’s underlying operations are implemented in C, and C’s performance is self-evident compared to Python’s.
from time import time
a = list(range(300000000))
na = np.array(na)
start_time = time()
sum(a) / len(a)
print("calculate mean by list cost time {} s".format(time() - start_time))
start_time = time()
na.mean()
print("calculate mean by numpy.ndarray cost time {} s".format(time() - start_time))
Copy the code
Calculate mean by list cost time 20.95879817008972 s Calculate mean by numpy. Ndarray cost time 0.0011792182922363281 sCopy the code
Guess you like
- [1] Python Data Analysis Toolkit pandas
- [2] Python Data visualization toolkit Matplotlib
Writing a column is not easy, so if you find this article helpful, give it a thumbs up. Thanks for your support!
- Personal website: Kenblog.top
- Github site: kenblikylee.github. IO
Wechat scan qr code to obtain the latest technology original