preface
After two months, I came back, during which a lot of things happened, but also had doubts about their own persistence of doubt. But always forge ahead after decadence, and life also want to continue, can only with “xiongguan road is really like iron, now move from scratch” to encourage. I’m going home for quarantine. I’m going to set up a flag day.
We’ve covered some of the syntactical basics of Python, including Pyhton’s basic data structures, function and file reading and writing, and Pyhton’s object-oriented content. Next we’ll look at some common packages for data mining, of which NumPy is the most important base package for numerical computation.
ndarray
Ndarray is a highly efficient multi-dimensional array that provides convenient array-based arithmetic operations and flexible broadcast functions. We can quickly generate a 2*3 array.
import numpy as np
data = np.random.randn(2, 3)
print(data)
Copy the code
As a result of
[[0.52828809 0.75873811-0.81223681] [2.13722235 0.40123476-0.07276397]Copy the code
In fact, Ndarray is a universal multidimensional array container, which contains elements of the same type. We can check the dimension of the array through its Shape attribute, and check its data type through the dtype attribute. Examples are as follows
import numpy as np
data = np.random.randn(2, 3)
print(data.shape)
print(data.dtype)
Copy the code
The results are as follows
(2, 3)
float64
Copy the code
Generate ndarray
The simplest way to generate Ndarray is the array function, which takes any sequence object and generates a new NumPy array containing the passed data. Examples are as follows:
import numpy as np
data1 = [1, 2, 3, 4]
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr1 = np.array(data1)
arr2 = np.array(data2)
arr1 = arr1 * 10
arr2 = arr2 + arr1
print(arr1)
print(arr2)
Copy the code
The results are as follows
[10 20 30 40] [[11 22 33 44] [15 26 37 48]Copy the code
We can see that the array function converts arrays to NDARray, and we can also see that NDARRay simplifies array operations, eliminating a lot of for loops.
We can also create all-0 arrays with Zeros, all-1 arrays with ones, and uninitialized arrays with empty. The following code
import numpy as np
arr1 = np.zeros(10)
arr2 = np.zeros((5, 2))
print(arr1)
print(arr2)
Copy the code
The results are as follows
[0. 0 0. 0. 0, 0, 0, 0, 0, 0.] [[0. 0.] [0. 0.] [0. 0.] [0. 0.] [0. 0.]]Copy the code
Note that when we create arrays using Empty, we sometimes return uninitialized garbage values.
The following code
import numpy as np
arr1 = np.empty(10)
arr2 = np.empty((5, 2))
print(arr1)
print(arr2)
Copy the code
The results are as follows
[0/0/0.] [[6.95006917E-310 1.29189234E-316] [5.39246171E-317 5.39246171E-317] [6.95006798E-310 [5.39246171E-317 5.39246171E-317] [5.39247752E-317 6.95006795E-310]Copy the code
Index base and slice
Indexing and slicing one-dimensional arrays is simple, much like slicing Python lists.
The following code
import numpy as np
arr1 = np.arange(10)
arr2 = arr1[5:8]
print(arr1)
print(arr2)
Copy the code
Note that the slice of the array is the view of the original array, which means that the data is not copied, and any changes to the view are reflected on the original array. The reason for this is that NdarRay is designed to handle large arrays, so you can imagine that copying the array is expensive.
The following code
import numpy as np
arr1 = np.arange(10)
arr1[5:8] = 12
print(arr1)
Copy the code
The results are as follows
[0 12 3 4 12 12 12 12 8 9]Copy the code
A slice index of a multidimensional array is the same as a slice index of a one-dimensional array, except that the elements of a one-dimensional array slice are numbers, while the elements of a multidimensional array slice are either a one-dimensional array or a multidimensional array.
The following code
import numpy as np
arr1 = np.random.randn(3, 3)
print(arr1)
print(arr1[:2])
Copy the code
The results are as follows
[[0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601]] [[0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601] 0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775]Copy the code
We can put [0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601] [1.3516511-0.65024839-1.68451601] As three elements, arr1[:2] takes the first two elements.
For multi-dimensional arrays we can also do multi-group slicing
The following code
import numpy as np
arr1 = np.random.randn(3, 3)
print(arr1)
print(arr1[1:, :2])
Copy the code
The results are as follows
[[1.51132511-0.16890946-0.78987301] [0.41426026-0.09105493 1.44744887] [1.79046674 0.27690028 1.31201169]] [[1.51132511-0.16890946-0.78987301] [1.79046674 0.27690028 1.31201169] 0.41426026-0.09105493] [1.79046674 0.27690028]Copy the code
We can also pass in booleans to slice arrays for more flexibility, where the Boolean array length must match the length of the array axis index.
The following code
import numpy as np
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
print(names == 'Bob')
print(data[names == 'Bob'])
Copy the code
The results are as follows
[True False False False False] [[1.20875931 0.54870492-0.45572233-0.58897014] [-1.42004058-0.81150623 1.03740228 0.91427144]]Copy the code
From the above results we can see that the array index passed in a Boolean value is the row that returns true.
We can also slice an array by passing in a Boolean value
The following code
import numpy as np
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
print(names == 'Bob')
print(data[names == 'Bob', :3])
Copy the code
The results are as follows
[True False False False False False] [[1.08094968-0.29838004 0.80950847] [0.10917791 0.79569972 0.47027354]Copy the code
It’s important to note that Python keywords and and or for Boolean array is useless, must use & and | instead.
Array transpose and transpose.
Arrays can be transposed by T, and inner products can be computed by dot
The following code
import numpy as np
arr = np.arange(15).reshape((3, 5))
print(arr)
print(arr.T)
print(np.dot(arr, arr.T))
Copy the code
The results are as follows
[[0 12 3 4] [5 6 7 8 9] [10 11 12 13 14]] [[0 5 10] [1 6 11] [2 7 12] [3 8 13] [4 9 14]] [[30 80 130] [80 255 [430] 130, 430, 730]]Copy the code
Ndarray can also use transpose to pass in the number of shafts
The following code
import numpy as np
arr = np.arange(16).reshape((2, 2, 4))
print(arr)
print(arr.transpose(1, 0, 2))
Copy the code
The results are as follows
[[0 12 3] [4 5 6 7]] [[8 9 10 11] [12 13 14 15]]] [[[0 12 3] [8 9 10 11]] [[4 5 6 7] [12 13 14 15]]] [[4 5 6 7] [12 13 14 15]]Copy the code
The last
More confusion, less gain. More exciting content can pay attention to the public number QStack, pursue the purest technology, enjoy the joy of programming.