directory

Abstract

1. Model structure

Detailed description of parameters of each layer:

1. INPUT layer -INPUT

2. C1 layer – convolution layer

3. S2 Layer – Pooling layer (lower sampling layer)

4. C3 layer – convolution layer

5. S4 Layer – Pooling layer (lower sampling layer)

6. C5 Layer – convolution layer

7. F6 Layer – Full connection layer

8. Output layer-full connection layer

Summary of parameters of each layer

2. Model characteristics

Code replay:


Abstract

Lenet-5 is a Convolutional Neural Network (CNN) proposed by LeCun in 1994, which is used to recognize handwritten numbers and machine-printed characters. As a pioneering work of Convolutional Neural Network, lenet-5 is named after the author LeCun. 5 is the code name for the results of LeNet’s fifth edition. Lenet-5 states that the correlation between pixel features in the image can be extracted by the convolution operation of parameter sharing, and the combined structure of convolution, subsampling (pooling) and nonlinear mapping is the basis of most of the current popular deep image recognition networks.

1. Model structure

As shown in the figure above, LENET-5 contains a total of 7 layers (the input layer is not used as a network structure), which are respectively composed of 2 convolution layers, 2 down-sampling layers and 3 connection layers.

Detailed description of parameters of each layer:

1. INPUT layer -INPUT

The first is the data INPUT layer, and the ReSize of the INPUT image is 32*32.

Note: This layer does not count as lenet-5’s network structure. Traditionally, the input layer is not considered one of the network hierarchies.

2. C1 layer – convolution layer

The calculation formula of the convolutional layer refer to this article: CNN Basics — Composition of convolutional Neural Networks _AIhao -CSDN blog

Input image: 32 x 32

Convolution kernel size: 5×5

Convolution kernel type: 6

Output FeatureMap size: 28×28 (32-5+1) =28

Number of neurons: 28×28×6

Trainable parameters :(5×5+1) × 6 (each filter 5×5=25 unit parameters and a bias parameter, a total of 6 filters)

Connection number :(5×5+1) ×6×28×28=122304

-Detailed description: Carry out the first convolution operation on the input image (use 6 convolution kernels with a size of 5×5), and get 6 C1 feature maps (6 feature maps with a size of 28×28, 32-5+1=28). Let’s see how many parameters are needed. The size of the convolution kernel is 5×5, so there are 6× (5×5+1) =156 parameters in total, where +1 means that a kernel has a bias. For the convolution layer C1, each pixel in C1 is connected with 5×5 pixels and 1 bias in the input image, so there are 156×28×28=122304 connections in total. There are 122,304 connections, but we only need to learn 156 parameters, mainly through weight sharing.

3. S2 Layer – Pooling layer (lower sampling layer)

Input: 28 x 28

Sampling area: 2×2

Sampling method: add four inputs, multiply by a trainable parameter, and add a trainable bias. The result passes sigmoid

Sampling type: 6

FeatureMap size: 14×14 (28/2)

Number of neurons: 14×14×6

Connection number :(2×2+1) ×6×14×14

The size of each feature graph in S2 is 1/4 of that in C1.

– The first convolution is followed by pooling operation, using 2×2 cores for pooling, so S2, six 14×14 feature graph (28/2=14) is obtained. The pooling layer S2 is the sum of pixels in the 2×2 area of C1 multiplied by a weight coefficient plus a bias, and then the result is mapped again. There are 5x14x14x6=5880 connections simultaneously.

4. C3 layer – convolution layer

Input: all six or more feature map combinations in S2

Convolution kernel size: 5×5

Convolution kernel type: 16

Output featureMap size: 10×10 (14-5+1)=10

Each feature map in C3 is connected to all six or several feature maps in S2, indicating that the feature map of this layer is a different combination of feature maps extracted from the previous layer

One way that exists is that the first six feature graphs of C3 take three adjacent subsets of feature graphs in S2 as inputs. The next six feature graphs take four adjacent subsets of feature graphs in S2 as input. The next three take non-adjacent subsets of four feature graphs as inputs. The last one input all the features in S2.

Is: training parameters: 6 x (3 x 5 x 5 + 1) + 6 x (4 * 5 * 5 + 1) + 3 * (4 * 5 * 5 + 1) + 1 x (6 * 5 * 5 + 1) = 1516

Number of connections: 10 x 10 x 1516=151600

-After the first pooling is the second convolution, the output of the second convolution is C3, 16 features of 10×10, the size of the convolution kernel is 5×5. We know that S2 has 6 14 by 14 features, how did we get 16 features from 6 features? Here are 16 feature maps obtained by calculating the special combination of feature maps of S2. Details are as follows:

The first 6 feature maps of C3 (corresponding to the 6 columns of the first red box in the figure above) are connected to the 3 feature maps connected to the S2 layer (the first red box in the figure above), and the following 6 feature maps are connected to the 4 feature maps connected to the S2 layer (the second red box in the figure above). The following three feature maps are connected with the four feature maps of S2 layer that are not partially connected, and the last one is connected with all feature maps of S2 layer. Convolution kernel size is still 5 x5, so there are 6 x (3 x 5 x 5 + 1) + 6 x (4 * 5 * 5 + 1) + 3 * (4 * 5 * 5 + 1) + 1 x (6 * 5 * 5 + 1) = 1516 parameters. And the image size is 10 by 10, so there are 151,600 connections.

The convolution structure of C3 and the first three graphs in S2 is shown as follows:

The corresponding parameters in the figure above are 3×5×5+1. Six feature graphs are obtained by six convolution times, so there are 6× (3×5×5+1) parameters. Why the above combination? The paper says there are two reasons for this: 1) reducing parameters, and 2) this asymmetric combination of links facilitates the extraction of multiple combination features.

5. S4 Layer – Pooling layer (lower sampling layer)

Input: 10 * 10

Sampling area: 2×2

Sampling method: add four inputs, multiply by a trainable parameter, and add a trainable bias. The result passes sigmoid

Sampling type: 16

FeatureMap size: 5×5 (10/2)

Number of neurons: 5×5×16=400

Connection number: 16× (2×2+1) ×5×5=2000

The size of each feature map in S4 is 1/4 of that in C3

S4 is the pooling layer, the window size is still 2×2, there are 16 feature maps in total, 16 10×10 maps in C3 layer are pooled in 2×2 unit respectively to obtain 16 5×5 feature maps. There are 5x5x5x16=2000 connections. The connection mode is similar to S2 layer.

6. C5 Layer – convolution layer

Input: feature map of all 16 units of S4 layer (fully connected with S4)

Convolution kernel size: 5×5

Type of convolution kernel: 120

FeatureMap size: 1×1 (5-5+1)

Trainable parameter/connection: 120× (16×5×5+1) =48120

C5 layer is a convolution layer. Since the size of 16 graphs at S4 layer is 5×5, which is the same as that of the convolution kernel, the size of the graph formed after convolution is 1×1. There are 120 convolution results here. Each is linked to 16 diagrams of the previous layer. So there are (5x5x16+1)x120 = 48120 parameters and 48120 connections. The network structure of layer C5 is as follows:

7. F6 Layer – Full connection layer

Input: C5 120-dimensional vector

Calculation method: Calculate the dot product between the input vector and the weight vector, plus a bias, the result is output by sigmoid function.

Trainable parameter :84×(120+1)=10164

-Layer 6 is the full connection layer. The F6 layer has 84 nodes corresponding to a 7×12 bitmap, where -1 represents white and 1 represents black, so that the black and white of the bitmap of each symbol corresponds to a code. The training parameters and connection number of this layer are (120 + 1)x84=10164. The connection mode of layer F6 is as follows:

8. Output layer-full connection layer

The Output layer is also the full connection layer, with a total of 10 nodes representing the numbers 0 to 9 respectively. If the value of node I is 0, the result of network identification is the number I. Radial basis function (RBF) network connection is adopted. Assuming x is the input of the upper layer and Y is the output of RBF, the output of RBF can be calculated as follows:

The value of the above formula w_ij is determined by the bitmap encoding of I, where I ranges from 0 to 9 and j ranges from 0 to 7*12-1. The closer the value of RBF output is to 0, the closer it is to I, that is, the closer it is to the ASCII encoding diagram of I, indicating that the recognition result of the current network input is character I. This layer has 84×10=840 parameters and connections.

Above, Lenet-5 recognizes the number 3.

Summary of parameters of each layer

The network layer The input size Nuclear size Output size Trainable number of parameters
Convolution layer C_1 32 * 32 * 1 5 x 5 x 1/1, 6 28 * 28 * 6 (5 * 5 * 1 + 1) * 6
Lower sampling layer S_2 28 * 28 * 6 2 x 2/2 14 * 14 * 6 (1 + 1) * 6
Convolution layer C_3 14 * 14 * 6 5 x 5 x 6/1 16th 10 * 10 * 16 1516
Lower sampling layer S_4 10 * 10 * 16 2 x 2/2 5 * 5 * 16 (1 + 1) * 16
Convolution layer of C_5 5 * 5 * 16 5 x 5 x 16/1120 1 * 1 * 120 (5 * 5 * 16 + 1) by 120
Full connection layer F6 1 * 1 * 120 120 x 84 1 * 1 * 84 By 84 (120 + 1)
Output layer 1 * 1 * 84 84 x 10 1 * 1 * 10 X 10 (84 + 1)

2. Model characteristics

  • Convolutional networks use a three-layer sequence combination: convolution, down-sampling (pooling), and nonlinear mapping (the most important feature of Lenet-5, which lays the foundation of the current deep convolutional networks).
  • Use convolution to extract spatial features
  • The spatial mean of the map is used for downsampling
  • Use tanH or SigmoID for the final classifier
  • Sparse connection matrix between layers to avoid huge computational overhead

Code replay:

import torch
from torch import nn
from torch.nn import functional as F
from torchsummary import summary


The input size is 32x32
class LeNet(nn.Module) :
    def __init__(self) :
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1.6.5)
        self.conv2 = nn.Conv2d(6.16.5)
        self.fc1 = nn.Linear(16 * 5 * 5.120)
        self.fc2 = nn.Linear(120.84)
        self.fc3 = nn.Linear(84.10)
        self.pool = nn.MaxPool2d(2.2)

    def forward(self, x) :
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(x.size()[0] -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


if __name__ == '__main__':
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = LeNet()
    model.to(device)
    summary(model, (1.32.32))
Copy the code

Reference article: Network analysis (A) : Lenet-5 details (cuijiahua.com)