General situation of multi-label classification and thoughts on medical image classification

Recently, I was doing multi-label classification of fundus images. I read The Emerging Trends of Multi-label Learning1 by Liu Weiwei, a professor from Wuhan University. I also read a little article on medical image classification and natural image multi-label classification. This paper mainly summarizes the understanding of multi-label classification (MLC) after reading, and some thoughts on the characteristics of multi-label problems in medical images.

Summary of the review

In order to be lazy, I will not list the references of various methods here. On the basis of summarizing the content of the review, I have added a little of my own understanding. I read it in a skip, so there are only parts of the content.

Review the structure

The research focus of MLC includes the following aspects:

  • Extreme MLCXMLC, which is an MLC scenario with a very large number of categories. With the advent of the era of big data, the study of this scenario is of great significance.
    • Most of the work is done after SLEEC, mainly based on one-VS-all classifier, tree and embedding.
    • In theory, it is necessary to deal with the problem of long tail distribution for label sparsity.
  • MLC with missing/nosiy labelMLC version of non-fully supervised learning, which deals with tag problems.
    • Missing Label: Indicates that some categories have no label
    • Semi-supervised: transfer of traditional semi-supervised learning, part of data has label and part does not
    • Partial multi-label: indicates that some labels are untrusted, that is, fuzzy labels
  • online MLC for stream data: Due to the large amount of real-time streaming data produced on the Web, MLC for online real-time scenarios has attracted a lot of attention.
    • Streaming data cannot be preread into memory to fetch globally, and typically requires real-time processing of each timestamp
    • The existing Offline MLC model has an average effect on sequence data
    • Online MLC currently has no good experimental and theoretical results (very limited)

§4 Deep Learning for MLC

  • BP-MLL

The NN structure was first used in MLC by BP-MLL, which proposed a Loss function of Pairwise as follows:


E i = 1 y i 1 y i 0 ( p . q ) y i 1 x y i 0 exp ( ( F ( x i ) p F ( x i ) q ) ) E_{i}=\frac{1}{\left|y_{i}^{1}\right|\left|y_{i}^{0}\right|} \sum_{(p, q) \in y_{i}^{1} \times y_{i}^{0}} \exp \left(-\left(F\left(x_{i}\right)^{p}-F\left(x_{i}\right)^{q}\right)\right)

Where P,qp,qp and q are the categories predicted to be 1 and 0 respectively. The punishment term in the form of E − XE ^{-x} E − X is used to make the difference between different categories as large as possible. The overall idea is a rank loss.

Subsequent studies have found that BP-MLL can use cross-entropy loss, plus a little trick like ReLu/Dropout/AdaGrad, to achieve new SOTA performance in large-scale text classification scenarios where classic BP-MLL cannot be applied.

  • C2AE

The classical Embedding method can only obtain semantic dependency of label itself and cannot obtain higher dimensional connection. C2AE (Canonical AutoEncoder) is the first MLC method based on Embedding. It extracts features by self-encoder and uses DCCA (Deep Canonical Correlation analysis) to extract the relation between labels based on features, which belongs to embedding layer.

The overall objective function of C2AE is defined as follows:


min F x . F e . F d Φ ( F x . F e ) + Alpha. Γ ( F e . F d ) \min _{F_{x}, F_{e}, F_{d}} \Phi\left(F_{x}, F_{e}\right)+\alpha \Gamma\left(F_{e}, F_{d}\right)

Fx,Fe,FdF_x, F_e, F_dFx,Fe,Fd are feature mapping, encoding function and decoding function respectively. α\alphaα is the weight term balancing the two penalty terms. φ, γ \Phi, \Gamma φ, γ are the loss in latent space (between feature and encoding) and output space (between encoding and decoding), respectively.

Referring to the idea of CCA, C2AE makes the connection between instance and label as large as possible (minimizing the gap)


min F x . F e F x ( X ) F e ( Y ) F 2  s.t.  F x ( X ) F x ( X ) T = F e ( Y ) F e ( Y ) T = I \begin{aligned} \min _{F_{x}, F_{e}} &\left\|F_{x}(X)-F_{e}(Y)\right\|_{F}^{2} \\ \text { s.t. } & F_{x}(X) F_{x}(X)^{T}=F_{e}(Y) F_{e}(Y)^{T}=I \end{aligned}

The autoencoder uses a rank loss similar to the one above to make the difference between the different codes as large as possible.


Γ ( F e . F d ) = i = 1 N E i E i = 1 y i 1 y i 0 ( p . q ) y i 1 x y i 0 exp ( ( F d ( F e ( x i ) ) p F d ( F e ( x i ) ) q ) ) \begin{array}{l} \Gamma\left(F_{e}, F_{d}\right)=\sum_{i=1}^{N} E_{i} \\ E_{i}=\frac{1}{\left|y_{i}^{1}\right|\left|y_{i}^{0}\right|} \sum_{(p, q) \in y_{i}^{1} \times y_{i}^{0}} \exp \left(-\left(F_{d}\left(F_{e}\left(x_{i}\right)\right)^{p}-F_{d}\left(F_{e}\left(x_{i}\right)\right)^{q}\right)\right) \end{array}

The subsequent work of DCSPE and DBPC has further improved SOTA performance and inference speed of text classification.


  • patial and weak-supervised MLC

Interactive multi-label CNN Learning with Partial Labels in CVPR 2020 and T. Durand learning a deep Convnet in CVPR 2019 For multi-label classification with partial labels (hereinafter referred to as D and T according to sitting names), relevant research has been done.

T uses BCE Loss to train the label part, and then uses GNN to extract the association between labels. Experiments show that partial labeled large data sets have better effects than fully labeled small data sets, which further proves the research significance of MLC field of partial label.

On the basis of T, D adopts the idea of manifold learning to regard the manifold smoothness of label and feature as the cost of BCE Loss function, and then adopts the semi-supervised idea to synchronize CNN learning and similarity (I did not read this article, This description is similar to the π\ PI π model or the teacher-student structure.

  • The Advanced of SOTA MLC

Classification Chain: ADIOS cuts labels into Markov Blanket Chain, which can extract the relationships between labels and throw them into DNN training.

CRNN: Two articles treat categories as sequences, using CRNN or C-LSTM. Furthermore, attention/RL was used to learn the sequence of categories and find the optimal sequence. CVPR 2020 and AAAI 2020 each published a new thought called “Optimal Completion Incinerator + Multitask learning/minimal alignment.” Both attempts to dynamically adjust the order (order-free) of the label sequence.

Graph related

  • 2 build a directed graph between categories, and then use GCN training.
  • SSGRL6 uses embedding to carry out semantic decoupling, and then uses GNN to learn -semantic features composed of label+feature, so as to strengthen the features of instance and label, so as to learn the connection between higher-dimensional labels.
  • 3. Add connections between some layers of GCN and CNN so as to realize label-aware classification learning
  • 4. Use GCN to get rich semantic info, and then use non-local attention to get long semantic association.
  • 5. Use deep Forest, a tree ensemble approach that does not rely on the return mechanism. MLDF(multi-label Deep Forest) is proposed, which is said to be a better solution to over-fitting, and has achieved SOTA effect on 6 indexes, which is an exploration of Lightweight design.

MLC thinking in medical imaging

In the previous article on medical image segmentation (DeepIGeoS), The special points of Cathay for medical image are summarized as follows:

  1. Low contrast, high noise, presence of cavity
  2. There are huge differences in scale and feature among patients
  3. Heterogeneous representation of diseases
  4. Different definitions of doctors may cause inconsistent characteristics of ground-truth

This is mainly for segmentation, because CT and MRI images of general segmentation tasks are gray scale images with high Intensity, and perception 1 and 2 are basically not suitable for MLC scenes.

3. In MLC, features of different categories are not uniform. For example, some diseases may have a large area covered by observable symptoms, while others only have a small part with observable symptoms. Specific effects need to do more experiments in specific scenes.

4. It can be related to the partial label problem in MLC. If the judgment of disease is uncertain, for example, the doctor has obtained several possible symptoms for a patient, but no further examination is conducted, then it may be possible to design a method to predict the confidence of each label. Unfortunately, the scenario and data requirements feel a bit harsh.

Another thing worth mentioning is the imbalance of categories. Due to the small number of cases of some diseases, there may only be positive cases in single digits in the data collected. At this time, there may be nothing to learn about this category at all.

Finally, semi-supervision of medical images is popular. If some unlabeled data and some labeled data are used for semi-supervision, the performance of semi-supervision can be improved to some extent. Although it is not limited to medical images, the application of semi-supervision is particularly wide due to the difficulty of obtaining medical annotations.

reference


  1. W. Liu, X. Shen, H. Wang, AND I. W. Tsang, “Emerging Trends in Multi-label Learning,” arXiv:2011.11197 [CS], Email exchange with email exchange: Jan. 08, 2021. [Online]. Available: arxiv.org/abs/2011.11… . ↩
  2. Z. Chen, X. Wei, P. Wang, AND Y. Guo, “Multi-label Image Recognition with Graph convolutional Networks,” in CVPR, 2019, Pp. 5177-5186. ↩
  3. T. Chen, M. Xu, X. Hui, H. Wu, and L. Lin, “Learning Semanticspecific Graph Representation for Multi-label Image Recognition,” in ICCV, 2019, pp. 522 — 531.↩
  4. Y. Wang, D. He, F. Li, X. Long, Z. Zhou, J. Ma, and S. Wen, “Multi-label Classification with Label graph superimposing,” in AAAI, 2020, pp. 12 265 — 12 272.↩
  5. P. Tang, M. Jiang, B. N. Xia, J. W. Pitera, J. Welser, and N. V. Chawla, “Multi-label patent categorization with non-local attention-based graph convolutional network,” in AAAI, 2020.↩
  6. L. Yang, X. Wu, Y. Jiang, AND Z. Zhou, “Multi-label Learning with Deep forest,” CoRR, Vol. Abs /1911.06557, 2019.↩