NumPy supports a wide range of dimensional array and matrix operations and is a Python library for array operations.
This article is included in the pre-machine learning tutorial series.
Fundamentals of Python
Let’s first consolidate the basics of Python. Python has six standard data types: Number,String,List,Tuple,Set, and Dictionary. Immutable data: Number, String, and Tuple. Mutable data: List, Dictionary, Set.
1.
The list is wrapped in square brackets [], with variable values for each position.
list = [1, 2, 3, 4, 5, 6]
Copy the code
By position, such as the value of the second position:
list[1]
Copy the code
Get 2. From the third position to all values at the end of the list:
a[2:]
Copy the code
I get 3, 4, 5, 6.
Change the value of the specified position:
list[0] = 9
Copy the code
Listing A now outputs [9, 2, 3, 4, 5, 6].
2. Tuple
Tuples are enclosed by parentheses (), and the value of each position is immutable. Allow data duplication.
Tuple = ('a', 'a', 'c', 1, 2, 3.0)Copy the code
Output (‘a’, ‘a’, ‘c’, 1, 2, 3.0). Take the element in the last position:
tuple[-1]
Copy the code
The output of 3.0.
Tuple operations are similar to lists, except that you cannot change the value of an element in a tuple, otherwise an error will be reported.
tuple[2] = 'caiyongji'
Copy the code
3. Set{Set}
A collection is a collective containing non-repeating elements, enclosed by curly braces {}.
set1 = {'a','b','c','a'}
set2 = {'b','c','d','e'}
Copy the code
The output of set1 is {‘a’, ‘b’, ‘c’}. Note: Collections remove duplicate elements. Set2 outputs {‘b’, ‘c’, ‘d’, ‘e’}.
Unlike lists and tuples, collections are non-subscript, as in:
set1[0]
Copy the code
Now, let’s look at set operations.
Difference set of set1 and set2:
set1 - set2
#set1.difference(set2)
Copy the code
Output: {‘a’}.
The union of set1 and set2:
set1 | set2
#set1.union(set2)
Copy the code
Output: {‘a’, ‘b’, ‘c’, ‘d’, ‘e’}.
Intersection of set1 and set2:
set1 & set2
#set1.intersection(set2)
Copy the code
Output: {‘b’, ‘c’}.
The symmetric difference set of set1 and set2:
set1 ^ set2
#(set1 - set2) | (set2 - set1)
#set1.symmetric_difference(set2)
Copy the code
Output: {‘a’, ‘d’, ‘e’}.
The above difference sets, union sets, intersection sets, symmetric difference sets have corresponding set methods, you can annotate the method to try.
4. Dictionary{Dictionary :Dictionary}
A dictionary is a mapping relationship and an unordered set of key-value pairs. Dictionaries do not allow duplicate keys, but they do allow duplicate values.
dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}
Copy the code
The dictionary output {‘gongzhonghao’: ‘caiyongji’, ‘website’: ‘blog.caiyongji.com’}, note that when the dictionary contains a repeat key, the latter will overwrite the previous element.
dict['gongzhonghao']
Copy the code
Output the string caiyongji. We can also use the get method to get the same effect.
dict.get('gongzhonghao')
Copy the code
View all keys:
dict.keys()
Copy the code
Output dict_keys([‘gongzhonghao’, ‘website’]).
View all values:
dict.values()
Copy the code
Output dict_values([‘caiyongji’, ‘blog.caiyongji.com’]). To change the value of an item:
dict['website'] = 'caiyongji.com'
dict
Copy the code
{‘gongzhonghao’: ‘caiyongji’, ‘website’: ‘caiyongji.com’}
Now that we know Python’s data types, we can learn to use NumPy.
Numpy
1. Create an array
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
Copy the code
The output of arR is array([1, 2, 3, 4, 5]).
We enter the following code to create a two-dimensional array:
My_matrix = [[1, 2, 3], [4 and 6], [7,8,9]] MTRX = np, array (my_matrix)Copy the code
The output of MTRX is as follows:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Copy the code
2. Index and slice
Index one-dimensional and two-bit arrays as follows:
Print (' arr [0] = ', arr [0], 'MTRX [1, 1] =', MTRX [1, 1])Copy the code
Output arr[0]= 1 MTRX [1,1]= 5.
Slice an array:
arr[:3]
Copy the code
The output is array([1, 2, 3]).
Reciprocal slice:
arr[-3:-1]
Copy the code
Output array([3, 4]).
Add step (step), which determines the slice interval:
arr[1:4:2]
Copy the code
Output array([2, 4]).
2d array slice:
mtrx[0:2, 0:2]
Copy the code
Output, code meaning is to take the first and second rows, the first and second columns:
array([[1, 2],
[4, 5]])
Copy the code
3. dtype
NumPy dtPE has the following data types:
- i – integer
- b – boolean
- u – unsigned integer
- f – float
- c – complex float
- m – timedelta
- M – datetime
- O – object
- S – string
- U – unicode string
- V – fixed chunk of memory for other type ( void )
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array(['apple', 'banana', 'cherry'])
print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)
Copy the code
The output is arr1.dtype= int32 arr2.dtype=
We can specify type DTYPE.
arr = np.array(['1', '2', '3'], dtype='f')
Copy the code
The output bit is array([1., 2., 3.], dType = FLOAT32), where 1. Represents 1.0 and you can see that dType is set to the bit FLOAT32 data type.
4. General method
4.1 arange
Np.arange (0,101,2) the output is as follows. This command indicates that data is generated evenly within the interval [0,101), with an interval step of 2.
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50,
52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76,
78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100])
Copy the code
4.2 zeros
Np.zeros ((2,5)) output the following, the command says, output 2 rows and 5 columns of all zeros matrix (two-dimensional array).
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
Copy the code
4.3 catalog.
Np. ones((4,4)) outputs a matrix with all 1s in 4 rows and 4 columns.
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
Copy the code
4.4 eye
Np.eye (5) The output result is as follows. The command indicates that the square matrix with 5 rows and 5 columns whose diagonal is 1 and the rest are all zeros is output. A square matrix is a matrix with the same row and column.
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Copy the code
4.5 rand
The np.random. Rand (5,2) command generates 5 rows and 2 columns of random numbers.
Array ([[0.67227856, 0.4880784], [0.82549517, 0.03144639], [0.80804996, 0.56561742], [0.2976225, 0.04669572], [0.9906274, 0.00682573]])Copy the code
If you want to ensure the same random number as in this example, you can use the same random seed as in this example. Set by nP.random. Seed method.
Np np. Random. Seed (99). The random. Rand (5, 2)Copy the code
4.6 randint
Np.random. Randint (0,101,(4,5)) the output is as follows. This command indicates that integers are randomly selected within the range [0,101) to generate an array of 4 rows and 5 columns.
array([[ 1, 35, 57, 40, 73],
[82, 68, 69, 52, 1],
[23, 35, 55, 65, 48],
[93, 59, 87, 2, 64]])
Copy the code
4.7 max min argmax argmin
Let’s start by randomly generating a set of numbers:
Np.random. Seed (99) ranarr = np.random. Randint (0,101,10) ranarrCopy the code
Output:
array([ 1, 35, 57, 40, 73, 82, 68, 69, 52, 1])
Copy the code
The maximum and minimum values are as follows:
print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())
Copy the code
The output is ranarr.max()= 82 ranarr.min()= 1. The index positions of the maximum and minimum values are:
print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())
Copy the code
Ranarr.argmax ()= 5 Ranarr.argmin ()= 0. Note that when multiple Max and min values occur, the preceding index position is taken.
3, NumPy advanced usage
1. reshape
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
Copy the code
Where, arr is a one-dimensional array and NEwarr is a two-digit array with behavior 4 and column 3.
print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)
Copy the code
Shape = (12,) newarr.shape= (4, 3).
The output of neWARr is as follows:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
Copy the code
2. Merge and split
2.1 concatenate
One-dimensional array merge:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
arr
Copy the code
Output: array([1, 2, 3, 4, 5, 6]).
Two dimensional array merge:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2))
arr
Copy the code
The output is:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Copy the code
We add the parameter axis=1:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
arr
Copy the code
The output is:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
Copy the code
Let’s move the mouse over to concatenate and press the shortcut keys Shift+Tab to see the method description. You can see that the concatenate method merges along the existing axis, with the default axis=0. When we set Axis =1, the merge moves along the column.
2.2 array_split
Split array:
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
newarr
Copy the code
The value of newarr is:
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]),
array([[ 9, 10],
[11, 12]])]
Copy the code
3. Search and filter
3.1 the search
NumPy uses the WHERE method to find an array index that meets the criteria.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
x
Copy the code
Output:
(array([1, 3, 5, 7], dtype=int64),)
Copy the code
3.2 screening
Let’s look at the following code:
bool_arr = arr > 4
arr[bool_arr]
Copy the code
Output: array([5, 6, 7, 8]). This time we return the value of the array, not the index. Let’s see what bool_ARR is actually about. Bool_arr output is:
array([False, False, False, False, True, True, True, True])
Copy the code
So we can replace the above filtering with the following command.
arr[arr > 4]
Copy the code
4. The sorting
The sort method sorts an Ndarry array.
arr = np.array(['banana', 'cherry', 'apple'])
np.sort(arr)
Copy the code
Output sorted result: array([‘apple’, ‘banana’, ‘cherry’], dtype=’
For a two-dimensional array, the sort method sorts each row individually.
arr = np.array([[3, 2, 4], [5, 0, 1]])
np.sort(arr)
Copy the code
Output result:
array([[2, 3, 4],
[0, 1, 5]])
Copy the code
5. Random
5.1 Random Probability
What if we want to fulfill the following requirements?
Generates a one-dimensional array of 100 values, each of which must be 3, 5, 7, or 9. Set the probability of this value to 3 to 0.1. Set the probability of this value to 5 to 0.3. Set the probability of this value to 7 to 0.6. Set the probability that this value is 9 to 0.
Let’s solve it with the following command:
The random choice ([3, 5, 7, 9], p = [0.1, 0.3, 0.6, 0.0], size = (100))Copy the code
Output result:
array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3,
5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5,
7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7,
7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])
Copy the code
5.2 Random Arrangement
5.2.1 permutation
Generate a new random arrangement from the original array.
np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.random.permutation(arr)
new_arr
Copy the code
The output is array([3, 1, 5, 4, 2]). The arr of the original array remains the same.
5.2.2 shuffle
Change the array to a random array. The word “shuffle” means to shuffle cards.
np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
arr
Copy the code
The output is array([3, 1, 5, 4, 2]). The arR of the original array changes.
5.3 Random Distribution
5.3.1 Normal distribution
The nP.random. Normal method is used to generate random numbers conforming to positive distribution.
x = np.random.normal(loc=1, scale=2, size=(2, 3))
x
Copy the code
The output is:
Array ([[0.14998973, 3.22564777, 1.48094109], [2.252752, -1.64038195, 2.8590667]])Copy the code
If we want to see the random distribution of x, we need to install Seaborn to draw the image. Install with PIP:
pip install -i pypi.tuna.tsinghua.edu.cn/simple seaborn
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=False)
plt.show()
Copy the code
5.3.2 Binomial distribution
The nP.random. Binomial method is used to generate random numbers conforming to binomial distribution.
X = np.random. Binomial (n=10, p=0.5, size=10) xCopy the code
The output is array([8, 6, 6, 2, 5, 5, 5, 5, 5, 3, 4]).
Drawing images:
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=True, kde=False)
plt.show()
Copy the code
5.3.3 Polynomial distribution
The polynomial distribution is a general representation of the binomial distribution. Multinomial random number is generated using the NP.random. Multinomial method.
x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
x
Copy the code
The above code can be simply interpreted as rolling dice. N =6 is the face of the die, and Pvals mean that the probability of each face is 1/6.
5.3.4 other
In addition to the above distribution, poisson distribution, uniform distribution, exponential distribution, Chi-square distribution, Pareto distribution and so on. Those who are interested can do their own search.
This article is included in the pre-machine learning tutorial series. Welcome to like, favorites, follow, more wonderful content about machine learning continue to update…