5.5 Logistic regression Sklearn summary
In SciKit-Learn, these three classes are primarily concerned with logistic regression. LogisticRegression, LogisticRegressionCV, and logistic_REGRESsion_PATH. The main difference between LogisticRegression and LogisticRegressionCV is that LogisticRegressionCV uses cross validation to select regularization coefficient C. The LogisticRegression needs to specify its regularization coefficients one at a time. LogisticRegression and LogisticRegressionCV are used in much the same way, except for cross-validation and the selection of regularization coefficient C.
Regression_path (LOGistic_REGRESsion_PATH) can only select appropriate regressive and regularization coefficients for regressive data. This is mainly used for model selection. This class is not normally used, so I won’t cover the logistic_REGRESsion_PATH class.
The parameters are described as follows:
- Penalty: STR type. L1 and L2 are optional. The default value is L2. Used to specify the specification to be used in the penalty item. Newton-cg, SAG and LBFGS solution algorithms only support L2 specification. L1G specification is the assumption of the model parameters meet the Laplace distribution, model parameters meet the assumption of the L2 gaussian distribution, the so-called paradigm is combined with constraints on parameters, makes the model more not fitting (overfit), but if you want to say is added constraints will be good, that no one can answer, can only say that under the condition of adding constraints, In theory you should be able to get better generalizations.
- Dual: bool for occasional or primitive methods. The default value is False. The duality method is only used to solve the L2 penalty term of linear multi-kernel (Liblinear). When sample number > sample characteristics, Dual is usually set to False.
- Tol: Standard to stop solving, float type, default 1E-4. That is, at what point do I get to, I stop, I think I’ve got the optimal solution.
- C: Reciprocal of the regularization coefficient λ, of type float, default 1.0. Must be a positive floating-point number. Like SVM, smaller values represent stronger regularization.
- Fit_intercept: Whether intercept or bias exists. It is a bool and defaults to True.
- Intercept_scaling: Useful only when the regularization term is “liblinear” and fit_Intercept is set to True. Type float. Default is 1.
- Class_weight: identifies various types of weights in the classification model. It can be a dictionary or balanced string. By default, None is entered. If you choose input, you can select Balanced and let the library calculate the type weights itself, or you can enter the weights for each type yourself. For example, for a binary model of 0,1, we can define class_weight={0:0.9,1:0.1} so that type 0 has a 90% weight and type 1 has a 10% weight. If class_weight is balanced, the library will calculate the weight based on the training sample size. The larger the sample size of a certain type, the lower the weight; the smaller the sample size, the higher the weight. When class_weight is balanced, the class weights are calculated as follows: n_samples/(n_classes * np.bincount(y)). N_samples is the number of samples, n_classes is the number of classes, and np.bincount(y) prints the number of samples for each class, for example, y=[1,0,0,1,1], then np.bincount(y)=[2,3].
So what does class_weight do?
In classification models, we often encounter two types of problems: the first is the high cost of misclassification. For example, to classify legitimate users and illegal users, it costs a lot to classify illegal users as legitimate users. We would rather classify legitimate users as illegal users, which can be discriminated manually, but do not want to classify illegal users as legitimate users. At this point, we can appropriately increase the weight of illegal users.
The second is that the sample is highly imbalanced. For example, we have 10,000 binary sample data of legitimate users and illegal users, among which 9995 are legitimate users and only 5 are illegal users. If we do not consider the weight, we can predict all test sets to be legitimate users, so the prediction accuracy is 99.95% theoretically. But it doesn’t make any sense. In this case, we can choose Balanced to have the library automatically increase the weight of the illegal user sample. By increasing the weight of a certain category, more sample categories will be classified into high-weight categories than without considering the weight, thus solving the above two types of problems.
- Random_state: random number seed, int type, optional parameter, default is none, only useful when regularization optimization algorithm is sag,liblinear.
- Solver: choose parameter optimization algorithm, only five optional parameters, namely the Newton – CG, LBFGS, liblinear, sag, saga. The default is liblinear. Solver parameter determines our optimization method for logistic regression loss function. There are four algorithms to choose, which are as follows:
- Liblinear: The open source LiblineAR library is used to achieve the internal use of the axis descent method to iteratively optimize the loss function.
- LBFGS: a quasi-Newton method, using the second derivative matrix of the loss function, namely the Hessian matrix, to optimize the loss function iteratively.
- Newton-cg: Is also a Newton method family, using the second derivative matrix of the loss function, namely the Hessian matrix to optimize the loss function iteratively.
- Sag: random mean gradient descent is a variant of gradient descent method. The difference from ordinary gradient descent method is that only a part of samples are used to calculate the gradient in each iteration, which is suitable for a large number of sample data.
- Saga: Weighting of stochastic optimization algorithms with linear convergence.
1. Liblinear is suitable for small data sets, while SAG and Saga are suitable for large data sets because they are faster.
2. For the multi-classification problem, only Newton-CG,sag, Saga and LBFGS can handle multiple losses, while Liblinear is limited to a pair of residual (OvR). When using liblinear, if you have a multi-category problem, you have to treat one category as one category and all the remaining categories as another category. An analogy, traversing all categories, categorizing.
3. Newton-cg,sag and LBFGS all require the first or second continuous derivative of the loss function, so they cannot be used for L1 regularization without continuous derivative, but only for L2 regularization. Both Liblinear and Saga eat BOTH L1 regularization and L2 regularization.
4. At the same time, SAG only uses part of the sample for gradient iteration each time, so it should not be selected when the sample size is small, but if the sample size is very large, such as more than 100,000, SAG is the first choice. But SAG can’t be used for L1 regularization, so when you have a large number of samples and need L1 regularization, you have to make your own trade-offs. Either reduce the sample size by sampling the sample, or return to L2 regularization.
Given the limitations of Newton-CG, LBFGS, and SAG, liblinear would have been better if it had not been a large sample. Wrong, because Liblinear has its weaknesses! As we know, logistic regression has binary logistic regression and multivariate logistic regression. One-vs-rest (OvR) and many-vs-many(MvM) are common for multivariate logistic regression. MvM is generally more accurate than OvR classification. Unfortunately, Liblinear only supports OvR, not MvM, so if we need a relatively accurate multivariate logistic regression, we can’t choose Liblinear. It also means that L1 regularization can’t be used if we need relatively accurate multivariate logistic regression.
- Max_iter: indicates the maximum number of iterations of algorithm convergence. The value is an int. The default value is 10. Only when the regularization optimization algorithm is newton-CG, sag and LBFGS is useful, the maximum number of iterations of algorithm convergence.
- Multi_class: specifies the classification type. The STR type is ovr and multinomial. The default value is OVr. Multinomial is also multinomial. Ovr is also known as one-VS-REST (OVR). Multinomial is also important because OVR is similar to multinomial in binary logistic regression.
What is the difference between OvR and MvM?
The idea of OvR is very simple, no matter how many meta-logistic regressions you have, we can view it as binary logistic regression. Specifically, for the classification decision of the KTH category, we take all samples of the KTH category as positive examples and all samples except samples of the KTH category as negative examples, and then perform binary logistic regression above to obtain the classification model of the KTH category. The classification models of other classes are derived in the same way.
MvM, on the other hand, is more complex. Here we take a special case of MvM, ONE-VS-One (OvO). If the model has T class, we choose two types of samples from all T class samples every time, which can be labeled as T1 and T2, put all samples with output OF T1 and T2 together, take T1 as a positive example and T2 as a negative example, conduct binary logistic regression, and obtain model parameters. We need T times T minus 1 over 2 classifications.
It can be seen that OvR is relatively simple, but the classification effect is relatively poor (here refers to the majority of sample distribution, some sample distribution OvR may be better). While MvM classification is relatively accurate, but the classification speed is not as fast as OvR. If OVR is selected, the four loss function optimization methods liblinear, Newton-CG, LBFGS and sag can be selected. Multinomial, however, is the only alternative to Newton-CG, LBFGS, and sag.
- Verbose: indicates the log verbose. The value is an int. The default is 0. That is, it does not output the training process, occasionally output results at 1, and output results greater than 1 for each submodel.
- Warm_start: indicates the hot start parameter. The value is a bool. The default is False. If True, the next training is in the form of an append tree (re-initialization using the previous call).
- N_jobs: indicates the number of parallel jobs. Int. The default value is 1. At 1, run the program with one CPU kernel, and at 2, run the program with two CPU cores. When -1, run the program with all CPU cores.
【 note 】 if for fitting, regularization, the L1 norm, L2 norm don’t understand, you can see the Daniel’s blog: blog.csdn.net/zouxy09/art… The LogisticRegression also has some methods for us to use:
Reference Documents: English documents Chinese documents