CV — Target detection: letterbox
- Reprint please note the original sources, thank: blog.csdn.net/pentiumCM/a…
CV – Target detection: letterBox
I. Related concepts
-
The letterbox:
-
Concept:
In most target detection algorithms, since the convolution kernel is square (not excluding the case that the convolution kernel has a rectangle), the size of the model input image also needs to be square. However, most of the images in the data set are basically rectangular, and resize the image directly to the square will lead to image distortion, such as the objects in the slender image will become deformed.
The letterBox operation: When resize the image, scale the image in the same proportion as the aspect ratio of the original image. When the long side is resized to the desired length, the rest of the short side is filled with grey.
-
Added a point:
- In the field of target detection, letterBox operation is performed on the data set image, and letterbox operation is also required on the annotation box.
-
Related algorithms:
Letterbox is used in the image preprocessing of YOLO, SSD and other algorithms.
-
Two, code implementation
(1) Python code
-
Example description:
-
The resize directly:
As we observe from the following figure, the original picture is on the left, and the picture after direct resize is on the right. It is obvious that the distortion of the car in the picture on the right is distorted
-
The letterbox operation:
In the figure on the right, we kept the length ratio of the original figure during resize, and filled the insufficient parts in the upper and lower parts with gray.
At the same time, the green annotation box on the left is under the size of the original image, and the blue annotation box on the right is after the letterbox. The coordinates of the annotation box should also be changed.
-
-
Complete code:
#! /usr/bin/env python # encoding: utf-8 ''' @Author : pentiumCM @Email : [email protected] @Software: PyCharm @File : util.py @Time : 2021/7/17 1:58 @desc : Import cv2 import torch import numpy as NP def letterbox_image(image_src, dst_size, pad_color=(114, 114, 114)): """ Zoom the image to keep the aspect ratio. :param image_src: original image (numpy) :param dST_size: (h, w) :param pad_color: fill color, default is gray :return: """ src_h, src_w = image_src.shape[:2] dst_h, dst_w = dst_size scale = min(dst_h / src_h, dst_w / src_w) pad_h, pad_w = int(round(src_h * scale)), int(round(src_w * scale)) if image_src.shape[0:2] ! = (pad_w, pad_h): image_dst = cv2.resize(image_src, (pad_w, pad_h), interpolation=cv2.INTER_LINEAR) else: image_dst = image_src top = int((dst_h - pad_h) / 2) down = int((dst_h - pad_h + 1) / 2) left = int((dst_w - pad_w) / 2) right = int((dst_w - pad_w + 1) / 2) # add border image_dst = cv2.copyMakeBorder(image_dst, top, down, left, right, cv2.BORDER_CONSTANT, value=pad_color) x_offset, y_offset = max(left, right) / dst_w, max(top, down) / dst_h return image_dst, x_offset, y_offset def letterbox_label(bounding_box, dst_size=(640, 640), x_offset=0, Y_offset =0, normalize=False, src_size=None): X/w :param dST_size: (tuple) The size of the filled image, (h, w) :param x_offset: Param y_offset: (float) The size of the left and right fill. Normalize :param normalize: (bool) If bounding_box is normalized :param src_size: (tuple) The size of the original image, (h, w) assert src_size, 'src_size is None' h = src_size[0] w = src_size[1] bounding_box = bounding_box.astype(np.float) bounding_box[:, 0] = bounding_box[:, 0] / w # top left x bounding_box[:, 1] = bounding_box[:, 1] / h # top left y bounding_box[:, 2] = bounding_box[:, 2] / w # bottom right x bounding_box[:, 3] = bounding_box[:, 3] / h # bottom right y y = bounding_box.clone() if isinstance(bounding_box, Torch.Tensor) else NP. Copy (bounding_box) pad_h = dst_size[0] Pad_w = dst_size[1 pad_w * (1 - 2 * x_offset) inner_h = pad_h * (1 - 2 * y_offset) y[:, 0] = inner_w * bounding_box[:, 0] + pad_w * x_offset # top left x y[:, 1] = inner_h * bounding_box[:, 1] + pad_h * y_offset # top left y y[:, 2] = inner_w * bounding_box[:, 2] + pad_w * x_offset # bottom right x y[:, 3] = inner_h * bounding_box[:, 3] + pad_h * y_offset # bottom right y return y def plot_one_box(box, image, label=None, color=(0, 255, 0), Line_bounding =3): "" Plots one bounding box on image using OpenCV: Param box: bounding_box, xyxy. Type: list :param image: :param Color: :param label: :param line_thickness: :return: """ assert image.data.contiguous, 'Image not contiguous. Apply np.asousarray (im) to plot_on_box() input Image.' tl = line_thickness or round(0.002) * (image.shape[0] + image.shape[1]) / 2) + 1 # line/font thickness # T = (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (int(box[2]), int(box[3]))) lineType=cv2.LINE_AA) if label: tf = max(tl - 1, 1) # font thickness t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 cv2.rectangle(image, c1, c2, color, -1, cv2.LINE_AA) # filled cv2.putText(image, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, LineType = cv2.line_aa) def letterbox_test(): "" """ # h,w dst_size = (640, 640) image_path = 'F:/develop_code/python/ssd-pytorch/VOCdevkit/VOC2007/JPEGImages/000012.jpg' labels = [156, 97, 351, 270] image.imread (image_path) # box: xyxy box = np.array(labels, dtype=np.float) box = np.reshape(box, (-1, 4)) cv2.imshow('org_image', image) image_directresize = image image_directresize = cv2.resize(image_directresize, Dst_size) cv2.imshow('image_directresize', image_directresize) # Visualize the tag for I in range(box.shape[0]): plot_one_box(box=box[i], image=image, line_thickness=2) letter_image, x_offset, y_offset = letterbox_image(image, dst_size) letter_box = letterbox_label(box, dst_size, x_offset, y_offset, False, image.shape[:-1]) for i in range(letter_box.shape[0]): plot_one_box(box=letter_box[i], image=letter_image, line_thickness=2, color=(255, 0, 0)) cv2.imshow('org_label', image) cv2.imshow('letter_image', letter_image) cv2.waitKey() if __name__ == '__main__': letterbox_test()Copy the code
\