From traditional machine learning to deep learning, vectorization is the most basic necessary acceleration skill
Machine learning needs to train a large amount of data to train the model. How to accelerate the training process is a problem that must be considered. The so-called vectorization, frankly speaking, is to use matrix multiplication to replace the accumulation of for cycles
A case in point
Question: If you were given 1 million pieces of data a1 to A1 million, and 1 million pieces of data B1 to b1 million, and asked to find the sum c of each pair of ai and BI multiplied together, what would you do?
The code for the for loop is shown below
Calculation results: 249879.05298545936, for cycle calculation time: 519.999980927msCopy the code
If numpy is used for the same operation (vectorization)
Calculation results: 249879.05298545936, matrix calculation time: 0.999927520752msCopy the code
The gap is deeper than the Mariana Trench, and the reason for the gap is that matrix computing programs such as NUMpy and MATLAB take full advantage of modern CPU SIMD technology to greatly improve the efficiency of computing
Vectorization in LR
Now let’s take the simplest LR logistic regression and talk about how to use vectorization techniques to manually train the model
If we choose BGD(batch gradient descent) or MBGD(small batch gradient descent), then the parameter training formula of LR is
It can be seen that in the formula derived above, the big matrix in the middle is actually the transpose of the input data matrix, so the code implementation is very simple. The LR algorithm implemented by numpy is
def grand_ascent(data_train, data_label): dataMatrix = np.mat(data_train) labelMat = np.mat(data_label).transpose() m, Shape (dataMatrix) weights = np.ones((n, 1)) alpha = 0.001 for I in range(0, 500): h = sigmoid(dataMatrix * weights) weights = weights + alpha * dataMatrix.transpose() * (labelMat - h) return weightsCopy the code
Long press the QR code to follow
Recommendation systems and machine learning
ID: RecomAI