“This is the 11th day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

Following the previous article, this article will introduce the process of replicating the RepVGG model, the replicating code is open source, github.com/Asthestarsf… .

Read the source code

Before starting to reproduce, let’s take a quick look at the model code, open repvgg.py in the official repository, and sort out the composition and hierarchy of the network:

ConvBn

The code first defines a conv_BN, which is one of the core RepVGG. After training, it will suck Bn into Conv. Please refer to my previous paper reading notes. The code is as follows:

 def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
     result = nn.Sequential()
     result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                                                   kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=False))
     result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
     return result
 ​
Copy the code

RepVGGBlock

RepVGGBlock consists of three branches, which are 3×3 convolution +BN, 1×1 convolution +BN, and Identity+BN, as shown below:

RepVGGBlock also has a use_se parameter that determines whether to use the channel attention module SEBlock. Some of the properties are explained briefly:

  1. self.deployIndicates the state of the network, which is False during training and True after reparameterization.
  2. padding_11The padding used to represent the 1 by 1 convolution is actually 0;
  3. The third branch exists only when the number of input channels is equal to the number of output channels and the step size is 1. The third branch does not exist in other cases.

RepVGGBlock itself has several important methods:

  1. _fuse_bn_tensor: This method is to complete the so-called “sucking Bn”. See the paper reading notes for the derivation of the formula. An important point is that when groups are not 1, it will be special and will be explained later.
  2. get_equivalent_kernel_bias: This method is called first_fuse_bn_tensorTo get the fusionConvolution kernel weightandbias, then the zero padding of 1×1 convolution kernel is 3×3, and the equivalent kernel and bias can be obtained by directly adding the weights and biases of all convolution kernels respectively.
  3. switch_to_deploy: This method is used for reparameterization after training.

RepVGG

RepVGG consists of five stages. The first stage contains only one RepVGGBlock, which is used to adjust the number of channels. Some important parameters and attributes are as follows:

  1. width_multiplier: control the width of each channel. The cardinality of the channel of the following 4 stages is 64,128,256,512 respectively.
  2. num_blocks: Controls the number of blocks per stage;
  3. override_groups_map: Controls the number of groups in a Block.
  4. _make_stage: Build each stage according to the input parameters.

Heavy parameterized

A function is used to reparameterize:

 def repvgg_model_convert(model:torch.nn.Module, save_path=None, do_copy=True):
     if do_copy:
         model = copy.deepcopy(model)
     for module in model.modules():
         if hasattr(module, 'switch_to_deploy'):
             module.switch_to_deploy()
     if save_path is not None:
         torch.save(model.state_dict(), save_path)
     return model
Copy the code

Retrieval model

With a general understanding of the source code, let’s start to reproduce the model.

New Folder

Traditionally, for simpler projects, I put the model code in a file named Model. As you can see from the official code, the model should have the following files:

 |- model
  |-  __init__.py
  |-  repvgg.py
  |-  repvggplus.py
  |-  se_block.py
Copy the code

Let’s start with the easy part — se_block.py

SEBlock

Since we’re new to the MegEngine framework, we need to refer to the official documentation for every step to make sure it’s absolutely correct, as follows:

 class SEBlock(M.Module):
 ​
     def __init__(self, input_channels, ratio: int = 16):
         super(SEBlock, self).__init__()
         internal_neurons = input_channels // ratio
         assert internal_neurons > 0
         self.gap = M.AdaptiveAvgPool2d((1, 1))
         self.down = M.Conv2d(
             in_channels=input_channels,
             out_channels=internal_neurons,
             kernel_size=1,
             bias=True
         )
         self.relu = M.ReLU()
         self.up = M.Conv2d(
             in_channels=internal_neurons,
             out_channels=input_channels,
             kernel_size=1,
             bias=True
         )
         self.sigmoid = M.Sigmoid()
 ​
     def forward(self, inputs):
         x = self.sigmoid(self.up(self.relu(self.down(self.gap(inputs)))))
         return inputs * x
 ​
Copy the code

Each time we write a section, we need to test it to make sure the model is built correctly:

 if __name__ == "__main__":
     se = SEBlock(64, 16)
     a = mge.tensor(np.random.random((2, 64, 9, 9)))
     a = se(a)
     print(a.shape)
Copy the code

ConvBn

Originally, I used m.Ential as the source code, but it doesn’t have the add_module method, but looking at the documentation, it can use OrderedDict to pass in names and corresponding modules.

Then The official RepVGG version was released by Megengine and I found that I could just use M.convbn2d, and actually Pytorch also had nn.ConvBn2d, so it was nice to see the API documentation and save some effort.

RepVGGBlock

Then Megengine doesn’t have a padding operator, but luckily the padding is not that complicated. You just need to create a tensor with hW of 3×3. The center value can be assigned to the corresponding number. The subsequent version 1.6 supports F.n.pad, and the code is as follows:

def _zero_padding(self, weight): if weight is None: return 0 else: For Windows 1.6, an error is reported. # kernel = f.zEROS ((*weight.shape[:-2], 3, 3), device=weight.device) # kernel[..., 1:2, 1:2] = weight kernel = F.nn.pad( weight, [*[(0, 0) for i in range(weight.ndim - 2)], (1, 1), (1, 1)]) return kernelCopy the code

Once we’ve copied the rest of the code we need to validate. Create a new verify.py in the home folder just to validate the build of the model. Use_se =Ture,groups=2,in_ch=out_ch, I also built a classifier to verify switch_to_deploy is correct, the code is as follows:

import megengine as mge import megengine.functional as F import numpy as np import model as repvgg class Classifier(mge.module.Module): def __init__(self, planes): super(Classifier, self).__init__() self.downsample = mge.module.Conv2d( in_channels=planes, out_channels=planes, kernel_size=3, stride=2, padding=1, ) self.gap = mge.module.AdaptiveAvgPool2d((1, 1)) self.fc = mge.module.Linear(planes, 1000) def forward(self, inputs): out = self.downsample(inputs) out = self.gap(out) out = F.flatten(out, 1) out = self.fc(out) return out def calDiff(out1, out2): # used to verify the output print (' ___________test diff____________ ') print (out1. Shape) print (out2. Shape) print (F.a rgmax (out1. axis=1)) print(F.argmax(out2, axis=1)) print(((out1 - out2)**2).sum()) def verifyBlock(): print('___________RepVGGBlock____________') inputs = mge.tensor(np.random.random((8, 16, 224, 224))) block = repvgg.RepVGGBlock(in_ch=16, out_ch=16, stride=1, groups=2, deploy=False, use_se=True) downsampe = Classifier(16) downsampe.eval() block.eval() out1 = downsampe(block(inputs)) print(block) print('___________RepVGGBlock switch to deploy____________') block.switch_to_deploy() block.eval() out2 = downsampe(block(inputs)) print(block) calDiff(out1, out2) if __name__ == '__main__': verifyBlock()Copy the code

Unexpectedly, an error was reported as soon as it ran. After checking the error information, it was found that the position of BN suction was wrong and the shape of the convolution kernel was wrong.

Don’t panic if you have this problem, open the official documentation and see that the convolution kernel shape of MegEngine’s grouping convolution is different:

Open PyTorch’s official documentation for comparison:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

# self.groups_channel = in_ch % groups in_ch=out_ch assert isinstance(branch, M.BatchNorm2d) # "self.identity" if not hasattr(self, 'bn_identity'): Bn_identity # group convlution kernel shape: # [groups, out_channels // groups, in_channels // groups, kernel_size, kernel_size] kernel_value = np.zeros( (self.groups_channel * self.groups, self.groups_channel, 3, 3), dtype=np.float32) for i in range(self.groups_channel * self.groups): # out_channels kernel_value[i, i % self.groups_channel, 1, 1] = 1 if self.groups > 1: kernel_value = kernel_value.reshape( self.groups, self.groups_channel, self.groups_channel, 3, 3) self.bn_identity = mge.Parameter(kernel_value)Copy the code

After switch_to_deploy, the result of the classifier is different. After repeated debugging, it is found that the assignment code is wrong:

 self.reparam.weight.data = kernel
 self.reparam.bias.data = bias
Copy the code

.data is not supported in MegEngine, so it was changed to deep copy:

 self.reparam.weight[:] = kernel
 self.reparam.bias[:] = bias
Copy the code

RepVGG

The rest of the code can be copied directly according to the official document, there is no need to pay attention to the point, after the completion of reproduction, also need to write the verification code, in the above verify.py add the following code:

 def verifyRepVGG(model_name, state_dict=None):
     print(f'___________{model_name}____________')
     inputs = mge.tensor(np.random.random((2, 3, 224, 224)))
 ​
     model = repvgg.__dict__[model_name](False)
     if state_dict is not None:
         model.load_state_dict(state_dict)
     model.eval()
     out1 = model(inputs)
 ​
     print(f'___________{model_name} switch to deploy____________')
     model._switch_to_deploy_and_save('./ckpt', 'test')
     model.eval()
     out2 = model(inputs)
 ​
     calDiff(out1, out2)
 ​
Copy the code

Py and se_block.py are now duplicated, and repvgg_plus.py will be covered in the next section.