- Thesis Title: “Deformable Convolutional Networks”
- Links to papers: arxiv.org/abs/1703.06…
0 foreword
First understand:
- The object of variable Convolution is Convolution itself. Therefore, both expansive Convolution and 3D Convolution can be in the form of variable Convolution
- This article is about theory and thesis. I haven’t used the variable convolution test effect yet, because PyTorch doesn’t seem to encapsulate the convolution method, which is a bit of a hassle. So I plan to do a simple test in my next article with the variable convolution that PyTorch already has reproduced on Github.
- Originally, I was learning contour detection algorithm, but I saw a SOTA algorithm called Deep Snake. After looking at the code for a long time, I found that it contained multiple algorithms such as DCN and DLA, so I started learning from scratch.
1. Overview
The author’s biggest contribution in this paper is:
- The variable convolution network is Deformable ConvNet (DCN).
- The same principle is used to propose a variable pooling layer called deformable ROI pooling.
- These two modules can be used in other network structures very easily and do not add a lot of parameters, but the effect is good. (The paper applies this approach to mainstream models).
The core contribution is, why does the convolution kernel have to be square? My detection target has all kinds of shapes, why does the convolution kernel have to be square?
Thus, the convolution kernel here is no longer a square, but a parameter that can be updated by gradient descent:
Figure A is the basic convolution kernel, b is the convolution kernel of variable convolution, and C and D are special cases of variable convolution. It doesn’t sound too hard, but the principle is very simple.
2 Implementation Principles
The figure above shows the variable convolution process. At a glance at the diagram, it’s not hard to see that this structure seems somewhat similar to SEnet. We’ll consider how to implement this process in the next code exercise.
In general terms, it’s this feature graph, and then it goes through an extra convolution layer to generate an offset, and then it fuses that offset with this feature graph.
3 Experimental Results
It is mentioned in this paper that the effect of variable convolution in the last three layers of feature extraction network is better.
The sample points learned by Deformable Convnets in a real-world task are shown above, which I think is a very interesting demonstration of interpretability.
The last three lines in the table above show the effect of this variable convolution, and it’s really good, it’s really improved, so does this variable convolution have a big effect on the number of and parameters?
As you can see, the movie of this parameter is very small, and the running time is almost the same. And I decided that once I replicated this variable convolution, I would use this in all of my models to see if I could improve. (More strange tricks and tricks).
Theoretically, this deformable convolution is not difficult. The key is how to achieve it. I hope I will not be too bumpy in the process of reproduction.
Reference article:
- Arxiv.org/abs/1703.06…
- Littletomatodonkey. Making. IO / 2018/12/02 /…
- zhuanlan.zhihu.com/p/52476083