• Article transferred from wechat official account: “Machine Learning Alchemy”
  • Author: Brother Lian Dan (authorized)
  • The author’s contact information: CYX645016617
  • Context Prior for Scene Segmentation

0 review

CPNet seems to have a better effect on some difficult samples:

The following keywords are mentioned: Corresponding Prior Map (CPmap) was constructed, and then the CPmap was fused into the network through Context Prior Layer (CPLayer). Meanwhile, Affinity Loss was used to correspond to the CPLayer.

I leave the above sentence in doubt.

1 Context Prior

The author proposes two concepts:

  • Intra-context: the relationship between pixels of the same category;
  • Inter-context: Relationships between pixels of different categories.

In image segmentation tasks, each image has its own ground truth, but the ground truth is a category that each pixel belongs to. It is difficult for a model to learn contextual information from individual pixels. There is therefore a need for a explicitly regularize network.

In this paper, the author uses Affinity Loss to explicitly regulate the network.

For each pixel in the image, this loss forces the network to consider the pixels of the same category (intra-context) and the pixels among the different categories (inter-context).

This loss allows the model to consider other pixels in the same category and other pixels in different categories.

2 build Ideal Affinity Map

Build the Ideal Affinity Map before using Affinity Loss.

  • Now we have a picture,Image, the ground Truth GT of this picture.
  • Image Features of size HxW are obtained through full convolutional network.
  • GT~\widetilde{GT}GT, then do one hot encoding for GT~\widetilde{GT}GT, so as to obtain a tensor of HxWxC, where C is the number of categories divided, Let’s call this tensor GT^\widehat{GT}GT
  • 0 Shape is 0 0 Changing GT^\widehat{GT}GT shape to NxC 0 0 Then calculate GT^GT^T\widehat{GT}\widehat{GT}^{mathrm{T}}GT GT T to get an NxN matrix, which is called A;
  • A is the Ideal Affinity Map we want.

A is our desired Affinity Map with size N × N, which encodes which pixels belong to the same category. We employ the Ideal Affinity Map to supervise the learning of Context Prior Map.

Each 1 in this A represents this pixel and other pixels of the same category. CPMap is trained through this Ideal Affinity Map

3 Affinity Loss

We use full convolutional network for feature extraction of Image to obtain a feature map of HxW size, which has been mentioned above, but the number of channels in this feature map has not been mentioned. See the following figure:

The number of channels is HxW, which is also the N mentioned above, so that for each pixel, its feature is exactly 1xN. Is this exactly corresponding to the Ideal Affinity Map obtained by us?

So here we can use a simple binary cross entropyPredicted Prior Map and Ideal Affinity MapThe cost:

Is that the end of it? Not:

However, such a unary loss only considers the isolated pixel in the prior map ignoring the semantic correlation with other pixels.

Intuitively, the above only considers the relationship between two pixels, but ignores more semantic relationships. So you need another set of losses.

A represents the elements in the Idea Affinity Map, and P represents the matrix of the NxN obtained from X. Formula (2) says, there’s a little bit of target detection going on here.

For example, suppose there are a total of 5 pixels between:

  • A = [0,0,1,0,1], indicating that there is a third pixel and a fifth pixel of the same type;
  • P = [0.1,0.2,0.8,0.2,0.9], indicating that among the predicted pixels, the first pixel has a probability of 0.1 and the third pixel has a probability of 0.8.
  • Apa \frac{AP}{a} AAP, which is exactly the number of true positive/true samples, is the recall rate;
  • App \frac{ap}{p}pap, which is exactly the number of positive/predicted true, is the accuracy rate;

This is a rough one, because the confusion matrix, the PR curve, the recall accuracy is something you should know, and if you don’t you should read the notes THAT I’ve written before. 107 notes have been taken. I’m too lazy to find links to articles, sorry lol)

Author’s Original text:

For formula (4), 1-a just reverses 0 and 1, so that 1 in 1-a represents two different pixels, thus reflecting inter-class connections of different classes.

Finally, Affinity Loss is:

4 Context Prior Layer

The structure of CPLayer is as follows, which is suddenly quite complex:

  • CPLayer: shape=H×W×C0shape=H\times W \times C_0shape=H×W×C0;
  • X becomes X~shape=H\times W \times C_1 \widetilde{X}shape=H×W×C1 X through a aggregation module;

Here’s how the Aggregation Module integrates spatial information:

It looks like it’s taking the convolution of two parallel 1xk and kx1, OK, that’s not hard.

  • X~\widetilde{X}X passes through a 1×1 convolution layer and Sigmoid layer to our prior Map (P), where P’s shape=HxWxN;
  • Under the supervision and training of Affinity Loss, P here reflects the relationship between pixels under ideal circumstances
  • X~\widetilde{X}X is 0 N×C1N times C_1N×C1
  • Y=PX~Y=P\widetilde{X}Y=PX
  • Can also get information system – class Y ˉ = (1 – P) X ~ \ bar {} Y = (1 – P) \ widetilde {X} ˉ Y = X (1 – P)

What do you mean by intra-class? From P, you can find a pixel and other pixels of the same class from
X ~ \widetilde{X}
, all pixel prediction probabilities can be found. Therefore, the predicted probability value of each pixel in Y is actually the combined result of the predicted values of other pixels in the same category. So this is a combination of in-class context considerations.


  • F = c o n c a t ( X . Y . Y ˉ ) F = concat(X,Y,\bar{Y})

5 details

The affinity Loss calculated before is written as LpL_pLp in the paper.

  • LuL_uLu refers to unary loss, that is, binary cross entropy;
  • LgL_gLg is global loss, that is, a loss of the whole.

Then the loss of the whole model:

  • In addition to LpL_pLp we mentioned, affinity loss, the weight is 1;

  • LaL_aLa is the Auxiliary Loss with a weight of 0.4

  • LsL_sLs is the main segmentation loss with a weight of 1.