Turbine cloud _Gpushare. Com | RTX 3090 exclusive training records

(article source | turbine cloud community focus/deep learning GPU cloud platform, artificial intelligence official experience url: gpushare.com)

Before entering the dry goods, let me introduce myself briefly. The author is mainly engaged in the CV field of deep learning. In the past year, due to work needs and personal interest, I spent a lot of time researching and reading papers in the tasks of target detection and case segmentation.

To this, the author also ran countless experiments, participated in large and small competition, EMMM, there is the final get the first, there is regret to get the second, there is the preliminary competition was eliminated without a match, also do half because of various reasons to give up, have the opportunity to later can and everyone Lao Lao…

Now, what I want to share with you today is the training record of MMDetectionV2 + ResNeSt + RTX3090, which CVer will be familiar with. These three products are from the fields of framework, algorithm and hardware, and they are all new models launched in 2020. So far, no public experiment has been shared that combines all three.

I recently rented a dual-sim 24-GIGAByte GeForce RTX 3090 and upgraded MMDetection to V2.7.0 (which was released in late November 2020 and was stuck at V2.2.0 for the second half of the year due to the speed of the update). Added ResNeSt as backbone support, immediately decided to run an experiment to test the performance, hoping to give you some reference.

Next, a brief introduction:

-MMDetection

This is a PyTorch based Detection segmentation framework developed by CUHK OpenMMLab and Sensetime. The team opened source after participating in the 2018 MS COCO Detection Challenge and first released V0.5.1 in October 2018. V1.0.0 will be released in January 2020, followed by a new update V2.0.0 in June 2020.

Compared with other similar open source frameworks, such as MaskrCNN-Benchmark and Detectron2 of Facebook or PaddleDetection of Baidu, **MMDetection is the most popular framework with the highest attention at present. ** The main reason is that, Its full coverage, high performance, and update speed and other characteristics.

-ResNeSt

Billed as the strongest ResNet improved edition, “ResNeSt: “Split-attention Networks” is a paper by Mu Li and Hang Zhang from Amazon. It was uploaded to the arXiv in April 2020 (it has not been published in conferences or journals so far, so it should be included in CVPR or ICCV conferences in 2021).

On the one hand, it has been significantly improved in image classification, target detection, instance segmentation, semantic segmentation and other tasks.

On the other hand, some doubts were raised, mainly from its comparative experiments, such as RESNest-50 vs. RESNET-50, which used a large number of newly published training and data enhancement strategies, but these technologies did not exist when RESNET-50 was proposed in 2015, so its fairness was challenged.

However, ResNeSt’s ability to generalize is evident in its frequent appearances in major competitions.

-RTX 3090

Nvidia’s GeForce RTX 30 series, released in September 2020, is the latest version of the GeForce RTX 3090 series to beat the GeForce RTX 2080 Ti series in both performance and price.

In addition, due to the epidemic situation, it has been in short supply, buying up and price raising in the domestic market for a long time after its launch, and even in the United States, it is hard to find cards, which undoubtedly makes deep learning enthusiasts yearn for it.

After the introduction of the background, now enter the topic, the relevant configuration of this experiment is as follows:

Python 3.8.7

PyTorch 1.7.1

Torchvision 0.8.2

CUDA 11.0

CuDNN 8.0.5

GCC 7.3

MMDetection 2.7.0

MMCV 1.

In terms of data, classic MS COCO 2017 was adopted, in which the number of training set train, validation set Val and test-dev were about 118K, 5K and 20K respectively.

In terms of algorithm, resNest-101 + FPN + SyncBN + Cascade Mask RCN is selected as the detector in this experiment. The latest HTC or DetectoRS are not used here, but mainly for direct comparison with the experimental results in ResNeSt paper.

Training and testing details are as follows:

The training duration is “2x schedule”, that is, 24 cycles, step=[16,22]

Multi-scale training 1600x[400,1200], this refers to the scale in the HTC paper, not 1333x in the ResNeSt paper [640,800]

Single-scale test 1600×1000, same as above, the traditional 1333×800 was not selected

Double card training, each card 2 pictures, namely the batch size is 4

The initial learning rate was set at 0.01, which was slightly higher than 0.005 defined by linear Scaling rule in traditional target detection

According to ResNeSt’s paper recommendation, Backbone and Head both adopt SyncBN

Other Settings and hyperparameters remain unchanged

The training cycle is about 11.5 hours, and the video memory is almost full during the training, as shown in the figure below. Due to the dual card running “2X schedule”, the whole training process takes about 34.5 hours according to the configuration of 8-card running “1X schedule” in the general paper, which is relatively fast. By the way, it usually takes more than 2 days (49 to 50 hours) to run “1x schedule” with the same configuration of 8 card 2080 Ti by personal test.

Due to resource constraints, the HTC 1600x[400,1400] was not used for multi-scale training. Similarly, Backbone does not join the recently popular DCNv2, which can increase mAP by 1 to 2 percentage points, as shown in the following figure (from Table 12 in ResNeSt).

According to previous experience, mAP of multi-scale test can be improved by 1.5 to 2.5 percentage points. Since this experiment is not for competition or ranking, single scale is adopted in the test phase to save time, and the results are as follows:

The above two figures respectively show the detection bbox mAP and segmentation SEGM mAP results of the model on the verification set. Compared with the results in the following figure (from Table 6 in ResNeSt), the experimental results are slightly better (Bbox 49.4% vs. 48.3%, Segm 43.1% vs. 41.6%), may benefit from the larger scale and longer training duration, but in any case, the dual-card RTX 3090 can reproduce the author’s 8-card result, which is quite satisfactory.

The following three figures are the bbox mAP on test-dev, seGM mAP (json files need to be uploaded to COCO’s official website) and Table 10 in the original text. It should be noted that DCNv2 is used in the results in the original text, and the same effect has been achieved in this experiment without additional enhancement for Backbone (BBox 50.0% vs. 50.0%, SEGM 43.7% vs. 43.0%).

To sum up, through testing the performance of GeForce RTX 3090, the final results are relatively satisfactory.

Turbine cloud _Gpushare. Com | RTX 3090 exclusive training records

Related Posts

Deep learning pipeline parallel PipeDream(3)– transformation model

Cloud small class | a three minutes fast customize OCR application artifacts, or else?

TensorFlow article | TensorFlow 2 x HParams based parameter tuning