preface
In this paper, the basic concepts and evaluation indexes of multi-label classification are introduced, and several methods, such as modeling techniques, supervised feature selection, unsupervised feature selection and upsampling, are summarized to improve the performance of multi-label classification models.
This article is from the public account CV technical Guide ******** technical summary series ********
Pay attention to the public CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.
What is multi-label classification?
As we all know, binary classification divides a given input into two categories, 1 or 0. Multi-label or multi-target classification predicts multiple binary targets simultaneously from a given input. For example, our model can predict whether a given picture is a dog or a cat, and whether its coat is long or short.
Targets are mutually exclusive in multi-label classifications, which means that an input can belong to more than one class.
This article will summarize some common methods to improve the performance of multi-label classification models.
Grading index
Most of the indicators used for binary classification can be applied to multiple labels by calculating the indicators for each column and then taking the average of the scores. One indicator we can use is logarithmic loss or binary cross entropy. For a better measure of class imbalance, we can use ROC-AUC.
ROC AUC curve
Modeling techniques
Before we get into the fancy tricks with features, let’s share some tips for designing models that are suitable for multi-label classification cases.
For most non-neural network models, the only option is to train a classifier for each target and then combine the prediction. The library SciKit-Learn provides a simple wrapper class to do this, OneVsRestClassifier.
Although this would enable the classifier to perform multi-label tasks, it is not the approach to take. This has several drawbacks. First of all, the training will be long, and for each goal, we are training a new model. Secondly, the model cannot learn the relationship between different tags or the correlation of tags.
The second problem can be solved by performing a two-stage training in which the target prediction is combined with the original features as the input data for the second stage training. The downside of this is that the training time will increase dramatically, because now you have to train twice as many models as before.
Neural networks are better suited to this situation. The number of tags is the number of output neurons in the network. Now we can apply any binary classification loss to the model, which will output all targets simultaneously. This solves two problems with non-neural network models because we only need to train one model and the network can learn different tag correlations by outputting neurons.
Supervised feature selection methods
Before starting any feature engineering or selection, features should be normalized or standardized. Using Quantile Transformer will reduce the skewness of the data and make the features subject to normal distribution. Another option is to standardize characteristics, which can be done by subtracting the mean from the data and then dividing by the standard deviation. This does a similar job compared to Quantile Transformer, both of which aim to convert data to more robust, but Quantile Transformer has a higher computational cost.
Using supervised feature selection in this context is a bit tricky because most algorithms are designed for a single target. To solve this problem, we can convert the multi-label case into a multi-class problem. One popular approach is LabelPowerset, where each unique label combination of training data is converted into a class. The SciKit-Multilearn library provides tools for this.
Tool links:
Scikit. Ml/API/skmulti…
After transformation, we can use information gain and CHI2 to select features. While this approach works, things get tricky when we have hundreds or even thousands of different combinations of unique tags, and that’s where an unsupervised feature selection approach might be better.
Unsupervised feature selection method
In the unsupervised method, we do not need to consider the nature of the multi-label case because the unsupervised method does not rely on labels.
Here are some algorithms:
-
Principal component analysis or other similar factor analysis methods. This removes redundant information from the features and extracts useful insights for the model. An important explanation for this is to ensure that the data is standardized before applying PCA, so that each feature contributes equally to the analysis. Another trick with PCA is that we can concatenate these reduced features back into the original data as additional information that the model can choose to use instead of the reduced features provided by the algorithm.
-
Variance threshold. This is a simple and effective way to reduce the dimension of features. We discard features that have low variance or distribution. This can be optimized by finding a better selection threshold, generally using 0.5 as the initial threshold.
-
Clustering. We can create a new feature by creating a cluster from the input data, and then assign the corresponding cluster to each row of the input data as the new feature column.
KMeans Clustering
Upsampling method
Using upsampling methods when our classification data is highly unbalanced, we then generate artificial samples for rare classes so that the model focuses on rare classes. In order to create a new sample in a multi-label setting, use MLSMOTE or multi-label synthesizing minority oversampling.
MLSMOTE Project address:
Github.com/niteshsukhw…
This is a change from the original SMOTE method. In this case, after we generate data for a small number of classes and assign a corresponding small number of labels, we also generate other labels associated with the data point by counting the number of occurrences of each label in adjacent data points, and take the frequency with more than half of the data point counts.
By Andy Wang
Compilation: CV technical Guide
Original link:
Andy-wang.medium.com/bags-of-tri…
Welcome to pay attention to the public number CV technical guide, focus on computer vision technology summary, the latest technology tracking, classic paper interpretation.
Reply keyword “technical summary” in the public account to obtain the summary PDF of the original technical summary article of the public account.
Other articles
Incremental learning deep neural network
Overview of human pose estimation in deep learning
Summary of common methods of small target detection
CV technical Guide – Summary and classification of essential articles
Normalization method summary | under fitting and over fitting
NMS summary | loss function technical summary
Attention mechanism technical summary | technical summary characteristics of pyramid
Pooling technical summary | summary data method
Paper innovation common thinking summary | GPU parallel card training summary
Summary of CNN structure Evolution (I) Classical model
Summary of CNN structural evolution (II) Lightweight model
Summary of CNN structure evolution (iii) Design principles
Summary of CNN visualization technology (I) Feature map visualization
Summary of CNN visualization technology (II) Convolution kernel visualization
CNN visualization technology summary (iii) class visualization
Summary of CNN visualization technology (IV) Visualization tools and projects
Summary of image annotation tools in computer vision
Review and summary of various Optimizer gradient descent optimization algorithms
Summary | classic open source data sets at home and abroad
The Softmax function and its misconceptions
Common strategies for improving machine learning model performance
Resources sharing | SAHI: big slices of small target detection in auxiliary reasoning library
Summary of image annotation tools in computer vision
Batch Size effect on neural network training
Summary of tuning methods for hyperparameters of neural networks
Use Ray to load the PyTorch model 340 times faster
Summary of image annotation tools in computer vision
A review of the latest research on small target detection in 2021
Capsule Networks: The New Deep Learning Network
Summary of computer vision terms (a) to build the knowledge system of computer vision
A review of small sample learning in computer vision