I am participating in the Mid-Autumn Festival Creative Submission contest, please see: Mid-Autumn Festival Creative Submission Contest for details

1. Common attention mechanisms explain the principles of channel attention and self attention

A:

Self-attention, Channel attention, spatial attention, multi-head attention, Transformer

The self-attention mechanism is a variant of the attention mechanism, which reduces the dependence on external information and is better at capturing the internal relevance of data or features.

1. The formula of CE Loss, then ask BCE Loss, just recite the formula

Sigmoid and Softmax, BCE and CE Loss function_ Abcat selfie blog -CSDN blog _CE Loss

2. What should we pay attention to in the training of Triplet Loss

A:

Construct data sets with large intra-class differences and small inter-class differences

3. Softmax takes the derivative

A:

4. KL divergence,

A:

KL divergence can be used to measure the similarity between two probability distributions. The closer the two probability distributions are, the smaller the KL divergence is

5. Check why smoothL1 is used to regression bbox in the model

A:

As can be seen from the above derivative, the gradient of L2 Loss includes (f(x) -y). When the predicted value f(x) differs greatly from the target value Y, gradient explosion is easy to occur. The gradient of L1 Loss is constant. Switching from L2 Loss to L1 Loss can prevent gradient explosions.

6. Cutting-edge detection paradigm DETR, Transformer and so on

A:

Unlike traditional computer vision techniques, DETR treats target detection as a direct ensemble prediction problem. It consists of a global set-based loss and a Transformer encoder-decoder structure that enforces a unique prediction through binary matching. Given a fixed query set of learning objects, DETR takes into account the relationship between the object and the global image context and outputs the final prediction set directly in parallel. Because of this parallelism, DETR is very fast and efficient.