The network of convolution god is calculated as follows: N=(W-F+2P)/S+1 where N: output size W: input size F: convolution kernel size P: filling value size S: step size small example:

nn.Conv2d(in_channels=3,out_channels=96,kernel_size=12,stride=4,padding=2)

In_channels =3: indicates the number of input channels. Since it is RGB, the number of channels is 3. Out_channels =96: indicates the number of output channels. Set the number of output channels to 96 (this can be set according to their own needs) kernel_size=12: indicates that the size of the convolution kernel is 12×12, that is, the above “F”, F=12 stride=4: indicates that the step length is 4, So this is S up here, S=4 padding=2: the padding value is 2, which is P up here, P=2

If your image input size is 256×256, N=(256-12+2×2)/4+1=63, that is, the output size is 63x63x96