The author | PRATEEK JOSHI compile | source of vitamin k | Analytics Vidhya

introduce

Machine learning frameworks or libraries sometimes change the landscape of the field. Today, Facebook open-source one such framework, DETR(DEtection TRansformer)

In this article, we’ll take a quick look at the concept of target detection and then dive directly into DETR and its benefits.

Target detection

In computer vision, object detection is a task where we want our model to distinguish objects from the background and predict the location and category of objects present in the image. Current deep learning approaches attempt to solve target detection tasks as classification problems or regression problems or both.

For example, in the RCNN algorithm, several regions of interest are identified from the input image. These regions are then classified into objects or backgrounds. Finally, a regression model is used to generate bounding boxes for the identified objects.

On the other hand, the YOLO framework (look once only) handles object detection in a different way. It takes the entire image in a single instance and predicts the bounding box coordinates and class probabilities for these boxes.

To learn more about target detection, see the following article:

  • A step-by-step introduction to basic target detection algorithms

    www.analyticsvidhya.com/blog/2018/1…

  • A practical guide to target detection using the popular YOLO framework

    www.analyticsvidhya.com/blog/2018/1…

Facebook AI introduces DETR

As described in the previous section, current deep learning algorithms perform target detection in a multi-step manner. They also suffered from the almost repeated problem of false positives. For simplicity, Facebook AI researchers have come up with DETR, an innovative and efficient approach to object detection.

Thesis: arxiv.org/pdf/2005.12…

Open source: github.com/facebookres…

Colab Notebook:colab.research.google.com/github/face…

This new model is so simple that you don’t need to install any libraries to use it. DETR treats the target detection problem as a set prediction problem with the help of a Transformer based encoder-decoder architecture. The so-called set refers to the set of bounding boxes. Transformer is a new deep learning model that has excelled in the NLP space.

The authors of this paper have compared Faster R-CNN and evaluated DETR on one of the most popular object detection datasets, COCO.

As a result, DETR achieved comparable performance. To be more precise, DETR shows significantly better performance on large objects. However, it does not work well on small objects. I’m sure researchers will solve this problem soon.

Architecture of DETR

In fact, the entire DETR architecture is easy to understand. It contains three main components:

  • CNN backbone
  • Encoder – Decoder Transformer
  • A simple feedforward network

First, CNN backbone generates feature maps from input images.

Then, the output of CNN backbone is converted into a one-dimensional feature graph, which is passed to Transformer encoder as input. The output of the encoder is N fixed length inserts (vectors), where N is the number of objects in the image assumed by the model.

The Transformer decoder decodes these inserts into boundary box coordinates by means of both itself and the encoder-decoder attention mechanism.

Finally, the feedforward neural network predicts the standardized center coordinates, height, and width of the boundary boxes, while the linear layer predicts the category labels using softmax functions.

idea

This is a very exciting framework for all deep learning and computer vision enthusiasts. Thank you so much to Facebook for sharing its approach with the community.

The original link: www.analyticsvidhya.com/blog/2020/0…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/