Machine learning algorithm engineer


Object detection and segmentation should be a common and cool task in computer vision. However, compared with image classification, object detection and segmentation task is more difficult, another point is that the code implementation is also more complex. There are several common open source projects for object detection and segmentation:

  • Detectron: FAIR, Based on Caffe2;

  • Maskrcnn-benchmark: FAIR based on PyTorch. PyTorch is an upgrade of Detectron.

  • MMDetection: Produced by Sensetime MMLab, based on PyTorch, Model Zoo is quite complete;

  • SimpleDet: Tucson, MxNet;

  • Tensorflow Object Detection (Tensorflow 1.x)

Each open source project includes implementations of the R-CNN series (Faster R-CNN, Mask R-CNN, etc.), but each has its own unique advantages that are not discussed here. Today, WE are going to introduce FAIR’s new product, Detectron2. This new project is the replacement of MasKRCNN-Benchmark. Detectron2’s advantages are mainly reflected in the following three points:

  • The R-CNN series was the strongest, after all, the R-CNN series was the leader of FAIR. New features such as Panoptic segmentation and rotating boxes were supported.

  • Detectron2 is highly modular and expansible. Specifically, Detectron2 can be regarded as a basic library. When implementing a new model, it is to import instead of modify.

  • Faster training speed (currently mmdetV2.0 speed is comparable to Detectron2)

Image. PNG – 53.9 kB

Although the Model zoo of Detectron2 is not like MMDetection, it is in line with the design concept of Detectron2. It only puts the most core and general ones in the framework, and other customization projects only need to rely on it. You can see the projects under Detectron2.

This article is a brief introduction to the main logic behind detectron, including an official example.

The installation of detectron2

Detectron2 is very simple to install and does not have a lot of potholes. please refer to the official installation instructions. First install version 1.3+ of PyTorch and the corresponding TorchVision:

PIP install -u Torch ==1.5 TorchVision ==0.6Copy the code

Then install cpython and PyCocoTools:

pip install cython; pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Copy the code

Detectron2 >=5 GCC & g++ >=5

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Copy the code

Detectron2 overview

To see what detectron2 is all about, check out the DefaultTrainer in Engine. Here are the core parts:

Model = self.build_model(CFG) # build model = self.build_model(CFG) Data_loader = self.build_train_loader(CFG) # For training, wrap with DDP. But don't need this for inference. if comm.get_world_size() > 1: model = DistributedDataParallel( model, device_ids=[comm.get_local_rank()], broadcast_buffers=False ) super().__init__(model, data_loader, optimizer) self.scheduler = self.build_lr_scheduler(cfg, Optimizer) # learning rate scheduler # checkpointer # Assume no other objects need to be checkpointed. # We can later make it checkpoint the stateful hooks self.checkpointer = DetectionCheckpointer( # Assume you want to save checkpoints together with logs/statistics model, cfg.OUTPUT_DIR, optimizer=optimizer, scheduler=self.scheduler, ) self.start_iter = 0 self.max_iter = cfg.solver. max_iter self.cfg = CFG Self.register_hooks (self.build_hooks())Copy the code

First of all, the parameter configuration of Detectron2 is based on YAML and Yacs, and there is a global variable CFG in the whole code. The advantage of this is that the code is cleaner, and we can easily modify all parameter configurations through the configuration file.

The build_model interface is realized in the Modeling submodule. In DETECtron2, when adding a model, you need to register a meta_arch (register means adding models to model zoo, and maintain a model dictionary). Add RetinaNet model for example:


@META_ARCH_REGISTRY.register()
class RetinaNet(nn.Module):
    pass

Copy the code

The build_optimizer submodule solver contains implementations of build_lr_scheduler and Build_lr_scheduler. The optimizer is momentum SGD and the LR scheduler has two implementations: WarmupMultiStepLR and WarmupCosineLR, which are the two most commonly used LR scheduling modes:

The data set construction and implementation of the Dataloader are contained in the Data submodule, which also contains the commonly used Transform. Detectron2 currently includes common object detection and segmentation datasets, such as VOC and COCO. You can check the supported datasets below:

from detectron2.data import MetadataCatalog
MetadataCatalog.list()

Copy the code

However, to use these data sets, you need to download and configure them according to the structure requirements, as described in the documentation.

Detectron2 also uses hook to realize some control logic during training, such as model saving and learning rate adjustment; Hooks are similar to Keras’ callback.

In addition to the above, an important sub-module of detectron2 is the Structures sub-module, which contains the detection and segmentation of common infrastructure such as Box, Instance, mask, etc. These components are common.

Custom data sets

With Detectron2, you can easily add your own data set: just provide the image path and annotations according to the data format, and then register the metadata of the data set. Detectron2 de-registers datasets as follows:

def get_dicts():
  ...
  return list[dict] 

from detectron2.data import DatasetCatalog
DatasetCatalog.register("my_dataset", get_dicts)

Copy the code

The above dict is an image and annotation, which should be provided in accordance with the annotation format and mainly contains the following fields (refer to Standard Dataset Dicts for details) :

  • File_name: indicates the absolute path of the image.

  • Height, width: the length and width of an image;

  • Image_id (STR or int) : unique id of an image;

  • Annotations (list[dict]) : Each dict is the annotations of an instance, such as box and mask;

For instance, you need to include the following fields:

  • Bbox (list[float]) : object label box, 4 float numbers, either xyxy, or xyWH;

  • Bbox_mode (int) : format of bbox, boxmode.xyxy_abs or boxmode.xyWH_abs;

  • ‘category_id’ : category_id (int): category tag, an integer in the range [0, num_categories). Normally, DETECtron2 sets num_categories to ‘background’;

  • Segmentation (list[list[float]] or dict) : If list[list[float]] is a set of polygon points, and If dict is a pixel-level split mask in COCO’s RLE format.

If your data is in COCO format, you can actually register quickly with the built-in functions of DETECtron2:

from detectron2.data.datasets import register_coco_instances
register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir")
register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")

Copy the code

We can also use the above format to split the data set. Here we use the balloon instance to split the data set. There is only one category of balloon, and the implementation is as follows:

from detectron2.structures import BoxMode def get_balloon_dicts(img_dir): json_file = os.path.join(img_dir, "via_region_data.json") with open(json_file) as f: imgs_anns = json.load(f) dataset_dicts = [] for idx, v in enumerate(imgs_anns.values()): record = {} filename = os.path.join(img_dir, v["filename"]) height, width = cv2.imread(filename).shape[:2] record["file_name"] = filename record["image_id"] = idx record["height"] = height  record["width"] = width annos = v["regions"] objs = [] for _, anno in annos.items(): assert not anno["region_attributes"] anno = anno["shape_attributes"] px = anno["all_points_x"] py = anno["all_points_y"] Poly = [(x + 0.5, y + 0.5) for x, y in zip(px, Py)] poly = [p for x in poly for p in x] # [np.min(px), np.min(py), np.max(px), np.max(py)], "bbox_mode": BoxMode.XYXY_ABS, "segmentation": [poly], "category_id": 0, "iscrowd": 0 } objs.append(obj) record["annotations"] = objs dataset_dicts.append(record) return dataset_dicts from detectron2.data  import DatasetCatalog, MetadataCatalog for d in ["train", "val"]: Register ("balloon_" + d, lambda d=d: Get_balloon_dicts ("balloon/" + D)) # Add meta information to the dataset, mainly the category name, MetadataCatalog. Get ("balloon_" + d).set(thing_classes=["balloon"]) # MetadataCatalog. Get ("balloon_" + d).evaluator_type = "coco" balloon_metadata = MetadataCatalog.get("balloon_train")Copy the code

Two data sets have been added, balloon_train and balloon_val, for training and validation, respectively. With the visualization tools in Detectron2, you can visualize samples in a dataset:

dataset_dicts = get_balloon_dicts("balloon/train") for d in random.sample(dataset_dicts, 1): img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, Scale =0.5) vis = visualizer. Draw_dataset_dict (d) cv2_imshow(vis.get_image()[:, :, ::-1])Copy the code

Custom Dataloader

Once the data set is added, we can use the default Dataloader of Detectron2 to load the data, The build_DETECtion_train_loader and build_DETECtion_test_loader methods are included in the DETECtron2. data module to load the training and test data sets, respectively. Here is the dataloader that builds the training data:

from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg from detectron2.data import Build_detection_train_loader CFG = get_cfg() # Here mask RCNN R50_FPN model parameter is used for COCO dataset cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) Cfg.datasets.TRAIN = ("balloon_train",) train_loader = build_detection_train_loader(CFG, mapper=None) ## This outputs basic statistics about the data set and the data processing and sampling methods used by dataloader [04/20 15:26:26d2.data.build]: Removed 0 images with no usable annotations. 61 images left. [04/20 15:26:26 d2.data.build]: Distribution of instances among all 1 categories: | category | #instances | |:----------:|:-------------| | balloon | 255 | | | | [04/20 15:26:26 d2.data.common]: Serializing 61 elements to byte tensors and concatenating them all ... [04/20 15:26:26d2.data.mon]: Serialized dataset takes 0.17 MiB [04/20 15:26:26d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()] [04/20 15:26:26 d2.data.build]: Using training sampler TrainingSamplerCopy the code

Further, you can look at the Batch data format that the Dataloader loads:

for batch_data in train_loader:
  print(batch_data[0])
  break

file_name': 'balloon/train/4543126482_92254ef046_b.jpg', 'image_id': 34, 'height': 1024, 'width': 679, 'image': $tensor, instances': Instances(num_instances=11, image_height=1062, image_width=704, fields=[gt_boxes: Boxes(tensor([[111., 352., 204., 456.], gt_boxes, gt_classes, gt_masks

Copy the code

This output format is exactly what is required for the model input, as discussed later.

Detectron2 does a great job with dataloader. You just need to define the DatasetMapper and send it to build_loader’s mapper field.

from detectron2.data import build_detection_train_loader from detectron2.data import transforms as T from detectron2.data import detection_utils as utils def mapper(dataset_dict): Mapper dataset_dict = copy. Deepcopy (dataset_dict) Image = utils. Read_image (dataset_dict["file_name"], format="BGR") numpy array image, transforms = T.apply_transform_gens([T.Resize((800, 800))], ["image"] = torch. As_tensor (image.transpose(2, 0, 10). Astype ("float32")) # transcribjects = [utils. Transform_instance_annotations (obj, transforms, image.shape[:2]) for obj in dataset_dict.pop("annotations") if obj.get("iscrowd", 0) == 0] # instances = instances. Tensor dataset_dict["instances"] = utils. Filter_empty_instances (instances Return dataset_dict data_loader = build_detection_train_loader(CFG, mapper=mapperCopy the code

In fact, mapper simply translates the original dict in the previous dataset into the output of a specific format as the input of the model. This process can perform some array enhancement, and since it operates on NUMpy arrays, you can also use a number of data enhancement libraries. One thing to note is that if you build dataloader without a mapper, you will use the default mapper (see dataset_mapper), so make sure the mapper fits your dataset. Detectron2 normalize image in Model, so mapper just needs to translate into the Tensor of Float32.

If you need to check that your Dataloader works as expected, you can use tools/visualize_data.py to visualize the loaded data, which is a great debug tool.

Model usage and creation

The model is the core part of target detection and segmentation. Detectron2’s model is also modular, mainly including four core parts:

  • Backbone: CNN feature extractor of the model, currently only supports Resnet. Another point is that Detectron2 also uses FPN as a part of Backbone.

  • (proposal_generator) A proposal_generator that currently supports only RPN and is used as part of the Faster R-CNN protocol.

  • Roi_heads: Detector of faster R-CNN series, including ROIPooler, box_head, mask_head, etc. ROIPooler refers to ROIPool and ROIAlign methods.

  • Meta_arch: Define the final model, not a single module, with backbone, proposal_Generator and ROi_heads to build the final model.

The advantage of modules is that you can reuse modules to combine them. For example, you can combine backbone, proposal_Generator, and ROi_heads to build different models. Because Detectron2 officially only supports the RCNN model, the implemented modules may not be generic enough. A more general module division is given in YOLOv4:

This division is more general and covers all mainstream detection models. The current MMDetection framework is designed with a similar module partition, so more models are supported.

As mentioned earlier, model building is done with a unified interface:

From detectron2.modeling import build_model = build_model(CFG) #Copy the code

You must know the input and output formats of the model in DETECtron2, whether you want to use the model or to customize the model. The input of the model is a list[dict], and each dict is a sample image and annotation information, as follows:

  • Image: (C, H, W) Tensor, cfg.input.FORMAT determines the channel FORMAT of the image, BGR by default, and cfg.model. PIXEL_{MEAN,STD} will be used to normalize the image.

  • “Instances” : Instances that include the following fields: GT_boxes, Gt_classes, GT_masks, gt_keypoints

  • Proposals: Instances of RPN for Faster R-CNN and other model inputs, including proposal_boxes and objectness_logits fields;

  • “Height”, “width” : the final output image size of the model may not be the same as the size of the image. For example, if you make resize, the model can be restored to its original size.

  • “Sem_seg” : (H, W) Tensor for semantically splitting GT from 0. (Actually, Detectron does splitting, that’s a good thing.)

The input to the model mainly consists of the fields shown above, but the required fields vary depending on the model. As mentioned above, the output of mapper is used as the input of the model, so the output format of Mapper should also meet the above requirements. The case of instance segmentation has been given in dataloader.

Dict [STR ->ScalarTensor] is the output of the model during training. Loss is used for training the model. The output is a list[dict], and each image corresponds to a dict:

  • “Instances” : instances instance, including :pred_boxes, scores, pred_classes, pred_masks, pred_keypoints

  • “Sem_seg” : (num_categories, H, W) Tensor, semantic segmentation prediction;

  • Proposals: Instances, including proposal_boxes and objectness_logits, are outputs of RPN.

  • “Panoptic_seg” : (Tensor, list[dict]), panoramic segmentation result;

Similarly, different types of models have different output fields. For example, the output format of instance segmentation is as follows:

{'instances': Instances(num_instances=15, image_height=480, image_width=640, fields=[pred_boxes: Boxes(tensor), scores: tensor, pred_classes: tensor, pred_masks: tensor])}

Copy the code

Further, we sometimes need to customize new models, which takes advantage of the Detectron2 registration mechanism mentioned earlier. Because Detectron2 is modular, you can also customize a module of the model, for example, you want to implement a new backbone, so the simple case is as follows:

from detectron2.modeling import BACKBONE_REGISTRY, Backbone, ShapeSpec ## register backbone_registrie. register() class backbone (backbone): def __init__(self, cfg, input_shape): super().__init__() # create your own backbone self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=16, Def forward(self, image): return {"conv1": Self. Conv1 (image)} # This method is required because the detector head needs to export the backbone channel def output_shape(self): return {"conv1": ShapeSpec(channels=64, stride=16)}Copy the code

To use this new backbone, just change cfg.model.backbone. NAME = ‘ToyBackBone’ in configuration file, then the build_model(CFG) MODEL will use this new backbone. A point to pay attention to is that custom created backbone should conform to certain specifications, this respect can refer to mobilenetv2 and Vovnet these two backbone custom creation.

In fact, to learn how to create our own model, on the one hand, we should first master the implementation of the core model in DETECtron2. On the other hand, Projects under Detectron2 also provide a lot of high-quality models for reference.

Model training and evaluation

We only need to use the DefaultTrainer in Engine to complete the training. The balloon dataset has been built previously, here mask_rcnn_R_50_FPN model is used for training, in order to speed up convergence, finetune is used for the model trained on COCO dataset:

from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("balloon_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = Model_zoo. Get_checkpoint_url (" COCO - InstanceSegmentation/mask_rcnn_R_50_FPN_3x. Yaml ") # COCO finetune model Cfg.solver. IMS_PER_BATCH = 2 cfg.solver. BASE_LR = 0.00025 cfg.solver. MAX_ITER = 300 # Number of iterations Cfg.model.roi_heads.batch_size_per_image = 128 cfg.model.roi_heads.num_classes = 1 os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()Copy the code

After running, you can see model information, data set information, and log during training. When your training is complete, you can test your images using DefaultPredictor:

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "Model_final.pth ") cfg.model.roi_heads.score_thresh_test = 0.7 # set the testing threshold for this MODEL cfg.DATASETS.TEST = ("balloon_val", ) predictor = DefaultPredictor(cfg) from detectron2.utils.visualizer import ColorMode dataset_dicts = get_balloon_dicts("balloon/val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata, Scale = 0.8, instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels ) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(v.get_image()[:, :, ::-1])Copy the code

If you want to evaluate the trained model, that is to calculate the mAP value, you only need:

from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader evaluator = COCOEvaluator("balloon_val", cfg, False, output_dir="./output/") val_loader = build_detection_test_loader(cfg, "balloon_val") inference_on_dataset(trainer.model, val_loader, evaluator) ------------------------------------------ [06/07 09:30:02 d2.evaluation.coco_evaluation]: Evaluation results for segm: | | AP AP50 | AP75 | APs | | APm APl | | : -- -- -- -- -- - : | : -- -- -- -- -- - : | : -- -- -- -- -- - : | : -- -- -- -- -- : | : -- -- -- -- -- - : | : -- -- -- -- -- - : | | | | 85.083 84.743 77.251 | | | | 93.479 60.356 5.959Copy the code

DefaultTrainer already includes learning rate policy, logging, model saving and evaluation, so for most cases, you can directly train your own model with DefaultTrainer. Detectron2 provides tools/train_net.py for the DefaultTrainer to be simply wrapped, we can train by executing the following commands:

./train_net.py --config-file .. /configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml --num-gpus $NCopy the code

For configuration parameters, in addition to directly modifying the YAML file, you can also modify the configuration parameters when executing commands, such as adjusting the learning rate:

./train_net.py --config-file .. / configs/COCO - InstanceSegmentation/mask_rcnn_R_50_FPN_1x yaml, num - gpus $N SOLVER. BASE_LR 0.0025Copy the code

If you want to customize some training logic, you can override some of DefaultTrainer’s methods as train_net.py does. For example, if you want to train using custom mapper, just override the build_train_loader method:

@classmethod
def build_train_loader(cls, cfg):
    return build_detection_train_loader(cfg, mapper=your_mapper)

Copy the code

If you find DefaultTrainer inadequate for your training needs, take it a step further and implement some of your own strategies by referring to Tools /plain_train_net.py.

Deployment model

Model deployment is also very important. Currently, Detectron2 only supports conversion to Caffe2. The official conversion script is tools/deploy/ Caffe2_Converter.py. Of course, switching to ONNX is also possible, but not all op support, this pit will have to step on its own. Another nice piece of work is that the detectron2 authors have provided a way to support conversion to TF models via TensorPack FasterRCNN, currently supporting Faster R-CNN and Mask R-CNN models.

Overall, Detectron2 is a very good open source project for object detection and segmentation, both in terms of code quality and framework flexibility. The only drawback is that it supports fewer models than MMDetection, but it can be extended.

reference

  1. facebookresearch/detectron2

  2. Detectron2 Beginner’s Tutorial 


Recommended reading

CPVT: A convolution implicitly encodes location information

DETR: Target detection based on Transformers

FAIR’s latest unsupervised Study: Unsupervised spatio-temporal representation learning in video

Tencent image ISTR: End to end instance segmentation based on Transformer! Performance SOTA, code has been open source!

MoCo V3: I’m not who you think I am!

The application of Transformer in semantic segmentation

ViT: Transformer is All You Need!

PVT: Pyramid Vision Transformer for backbone on intensive tasks!

FixRes: Beat SOTA on ImageNet data set twice

How can Transformer break into CV and kill CNN?

Try MoCo instead of pretrain on ImageNet!

Machine learning algorithm engineer