Abstract:This article will introduce the network structure of Yolov3 in detail.
Yolov3 network structure
In the blog “Yolo Development history and Network Structure”, we have explained the network structure of Yolov1 in detail, and briefly mentioned the improvement of the network structure of Yolov2 and Yolov3. This blog will introduce the network structure of Yolov3 in detail, the content is relatively simple.
Yolov3 network structure diagram
As can be seen from the figure, Yolov3 is mainly composed of the following parts:
- The input
- Basic network: The basic network can be selected according to specific needs. The author used darknet-53 designed by himself in the original text
- YOLOv3 network has three branches: Y1, Y2, and Y3
Network Components
DBL: Darknetconv2d_BN_Leaky, shown in the bottom left of Figure 1 in the code, is the basic component of Yolo_v3. It’s convolution +BN+Leaky relu. For V3, BN and Leaky Relu are already inseparable from the convolution layer (except for the last convolution) and together constitute the smallest component.
Resn: n stands for numbers. There are res1, res2,… ,res8, etc., indicate how many res_units are in this res_block. This is the big component of Yolo_v3, which starts with the residual structure of ResNet, which allows for a deeper network structure (up from v2’s Darknet-19 to V3’s Darknet-53, which has no residual structure). An intuitive interpretation of res_block can be seen in the lower right corner of Figure 1, whose basic component is also DBL.
Concat: tensor concatenation. Stitching together the upper samples of darknet’s middle layer and one of the layers behind. The operation of concatenation is different from the operation of add at the residual layer. Concatenation expands the dimension of the tensor, while ADD simply adds directly without changing the dimension of the tensor.
Three branches of the YOLOv3 network
Multi-scale detection -Y1
Applicable goals: Big goals
Path: marked with the green line
Output dimensions: 13 x 13 x 255
Output dimensions: 13×13: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.
Multi-scale detection -Y2
Applicable goals: Medium goals
Path: marked in yellow
Output dimension: 26×26×255
Specific explanation of output dimensions: 26×26: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.
Multi-scale detection -Y3
Applicable goals: Small goals
Path: purple line
Output dimension: 52×52×255
Output dimensions: 52×52: image size; 255= (80+5) ×3; 80: identify the number of objects; 5=x,y,w,h and c (confidence); 3:3 bounding boxes predicted for each point.
Click follow to learn about the fresh technologies of Huawei Cloud