SOLOV2- a new instance splitting framework using MMDetection2.0

This article by Lin big guy original, reproduced please indicate the source, from Tencent, Ali and other AI algorithm engineers of the first line of QQ communication group welcome you to join: 1037662480

Recently, I have been doing some instance segmentation operations to TensorRT, and found that for many two-stage instance segmentation algorithms, it is very troublesome to convert to another reasoning platform, mainly because there are some operations of RPN in it. Even for a model like CenterMask, box needs to be obtained first. We then go through ROIAlign and ROIPool to get the required mask. This process is not only computationally tedious, but also difficult to export to the desired model format. For example, ONNX does not support many ops in this model. When I looked at this graph, I found that SOLOV2’s performance seemed to be about the same as MaskrCNN, with half the reasoning speed, and faster and more accurate than BlendMask. Today we are implementing SOLOV2 with mmDetection 2.0.

At present, the problems and drawbacks of many instance segmentation algorithms are as follows:

  • It’s too slow. Maskrcnn, for example, has a reputation, but it’s hard to deploy to RealTime;
  • Precision is not enough, such as Yolact, even Yolact++, its accuracy can only be said to be satisfactory, even MaskRCNN can not match the strength of the division, no matter how fast, it will limit the use of the scene;
  • BlendMask, CenterMask and other algorithms are all the same, built on FCOS, there is no difference in essence, or the whole process of MaskRCNN is the same, but the detector has changed, which is still very troublesome for deployment.

Of course, this article is not about deployment, but to show you that we have a faster, more accurate model, and deployment may be relatively better. So is the first step to implement a wave in Python?

SOLOV2

For the SOLO series of algorithms, the specific context will not be repeated. Let’s start with an indicator comparison:

For V2, the most likely change is the dynamic mask head:

class SOLOV2Head:.def forward(self, feats, eval=False):
            new_feats = self.split_feats(feats)
            featmap_sizes = [featmap.size()[2 -:] for featmap in new_feats]
            upsampled_size = (feats[0].shape[2 -], feats[0].shape[- 3])
            kernel_pred, cate_pred = multi_apply(self.forward_single, new_feats,
                                              list(range(len(self.seg_num_grids))),
                                              eval=eval)
            # add coord for p5
            x_range = torch.linspace(- 1.1, feats[2 -].shape[- 1], device=feats[2 -].device)
            y_range = torch.linspace(- 1.1, feats[2 -].shape[2 -], device=feats[2 -].device)
            y, x = torch.meshgrid(y_range, x_range)
            y = y.expand([feats[2 -].shape[0].1.- 1.- 1])
            x = x.expand([feats[2 -].shape[0].1.- 1.- 1])
            coord_feat = torch.cat([x, y], 1)
            feature_add_all_level = self.feature_convs[0](feats[0]) 
            for i in range(1.3) : feature_add_all_level = feature_add_all_level + self.feature_convs[i](feats[i]) feature_add_all_level = feature_add_all_level + self.feature_convs[3](torch.cat([feats[3],coord_feat],1))

            feature_pred = self.solo_mask(feature_add_all_level)   
            N, c, h, w = feature_pred.shape
            feature_pred = feature_pred.view(- 1, h, w).unsqueeze(0)
            ins_pred = []

            for i in range(5):
                kernel = kernel_pred[i].permute(0.2.3.1).contiguous().view(- 1,c).unsqueeze(- 1).unsqueeze(- 1)
                ins_i = F.conv2d(feature_pred, kernel, groups=N).view(N,self.seg_num_grids[i]**2, h,w)
                if not eval:
                    ins_i = F.interpolate(ins_i, size=(featmap_sizes[i][0] *2,featmap_sizes[i][1] *2), mode='bilinear')
                if eval:
                    ins_i=ins_i.sigmoid()
                ins_pred.append(ins_i)
            return ins_pred, cate_pred
Copy the code

For the head of the mask, you can probably do this, (code credit@epiphqny).

In this paper, Solov2 can reach up to 42.6 AP, and our actual training can reach close to 40 AP, which is much better than other instance segmentation methods of backbone.

Results

We migrated the code to the latest 2.0 version of MMDetection on some open source SOLO. The latest test results are as follows:

The effect is generally very good. We have deployed the code to the Power Platform, if you are interested in instance segmentation you can download our code to run, and we hope you can join our community to discuss and communicate the most cutting-edge computer vision issues.

t.manaai.cn

Manaai. Cn/aicodes_det…

If you want to learn artificial intelligence and are interested in cutting-edge AI technology, you can join our knowledge planet to get the first time information, cutting-edge academic trends, industry news and so on! Your support will encourage us to create more often, and we will help you start a deeper journey of deep learning!