LOFO


LOFO(currently applicable to all models)

LOFO is a feature importance rendering scheme. Compared with other feature importance methods (intercept term of linear correlation regression model, feature importance of random forest, feature importance -plot_importance of XGBoost and LightGBM, correlation of linear data, etc.), its features are briefly summarized as follows:

  • Can be a good generalization to the unknown test set
  • A negative value is assigned to features that have negative effects
  • Group features, especially for higher-dimensional features such as TFIDF or one-hot features
  • Highly relevant features can be automatically grouped to prevent underestimation of their importance

Feature importance process of LOFO(Leave One Feature Out) :

  • Iteratively remove a feature from the feature set and calculate the feature importance of a set of sets using selected validation schemes to evaluate the performance of the model based on selected measures.

Steps:

    1. Input all features, LOFO evaluates the effect of the model containing all features based on all features;
    2. Delete one feature at a time, retrain the model, and evaluate its effect on the validation set;
    3. Record the mean and standard deviation of importance for each feature.

    Note: LOFO runs LightGBM by default if we don’t pass in any models.

FastLOFO(currently applicable to all models)

LOFO is relatively time consuming because of enumerations, and you can use FastLOFO to get the importance of features quickly.

  • Fast LOFO will use the trained model and a verification set to randomly perturb each eigenvalue, and then use the trained model to predict it. The significance of FLOFO’s feature is the difference between the predicted result of the model on the verification set and the previously undisturbed result.

Three, code,

# pip install lofo_importance from lofo import LOFOImportance from lofo import FLOFOImportance from lofo import plot_importance from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error as MSE from sklearn.metrics import make_scorer # train_data label test_data clf = LinearRegerssion() clf = clf.fit(train_data, Select * from 'data'; select * from 'data'; select * from 'data'; LOFO = FLOFOImportance(CLF, data, [signature],[label], scoring=make_scorer(MSE)) # cv=cv # get the mean and standard deviation of the importances in pandas format importances = LOFO.get_importance() importancesCopy the code

plot_importance(LOFO.get_importance(), figsize=(8, 8), kind='box')
Copy the code

plot_importance(LOFO.get_importance(), figsize=(8, 8))
Copy the code

Thank you!