This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money. Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Logistic regression model optimization
fullx = [] fsx = [] threshold = np.linspace(0,abs((LR_.fit(data.data,data.target).coef_)).max(),20) k=0 for i in threshold: X_embedded = SelectFromModel(LR_,threshold=i).fit_transform(data.data,data.target) fullx.append(cross_val_score(LR_,data.data,data.target,cv=5).mean()) fsx.append(cross_val_score(LR_,X_embedded,data.target,cv=5).mean()) print((threshold[k],X_embedded.shape[1])) k+=1 PLT. Figure (figsize = (20, 5)) PLT. The plot (threshold, fullx, label = "full") PLT. The plot (threshold, FSX, label = "feature selection") plt.xticks(threshold) plt.legend() plt.show()Copy the code
So we’ve been able to keep the model fitting efficient with feature selection, and now, if a doctor can come in and show us which of these remaining features are particularly important for specific conditions, maybe we can continue to reduce dimensions. Of course, in addition to embedding method, coefficient accumulation method or wrapping method can also be used
Two coefficient accumulation method
The principle of coefficient accumulation is very simple. In PCA, we select hyperparameters by drawing the cumulative explainable variance contribution rate curve. In logistic regression, we can use the coefficient coef_ to do so, and our logic for selecting the number of features is similar: Find the turning point where the curve goes from sharp to smooth, and the features that are accumulated before the turning point are what we need, but not after the turning point. However, this method is relatively troublesome, because we need to sort the feature coefficients from large to small first, and make sure that we know the position of the original feature corresponding to each sorted coefficient, so that we can find those important features correctly. If you want to use such an approach, it is more convenient to use the embedding method directly.
Simple and fast packaging method
In contrast, the packaging method can directly set the number of features we need. When logistic regression is applied in reality, there may be a requirement of “5~8 variables”, so the packaging method is very convenient. However, the use of logistic regression packaging method, like other algorithms, is not special,
Quadrilateral descent
The mathematical purpose of the important parameter max_iter logistic regression is to solve the value of the parameter that can optimize the model and have the best fitting degree, that is, to solve the value that can minimize the loss function. For binary logistic regression, there are many methods to solve parameters, the most common are Gradient Descent, Coordinate Descent, Newton-Raphson method, etc. The gradient descent method is the most famous one. Each method involves complex mathematics, but the tasks performed by the computations are similar.
Five-step concept and disambiguation
Core myth: What exactly is a step size? Many blogs and textbooks describe the step length as “the length of each step in gradient descent in the opposite direction of the gradient,” “the length of each step in the steepest and easiest descent,” or “the amount of reduction in the loss function at each step in gradient descent,” or even, The step size is the hypotenuse or opposite of the famous derivative triangle in two dimensions.