“This is the first day of my participation in the First Challenge 2022. For details: First Challenge 2022”

Main problems in image processing

If we can think of an image as a two-dimensional view of a three-dimensional world, then a digital image as a 2D image can be represented using a finite set of numbers called pixels (the concept of pixels is explained in detail in the sections pixels, colors, channels, images, and color Spaces). We can define the goal of computer vision as converting these 2D data into the following:

  1. New data representation (for example, new image)
  2. Decision objectives (for example, performing specific decision tasks)
  3. Target results (for example, classification of images)
  4. Information extraction (for example, target detection)

During image processing, the following problems are often encountered:

  1. The fuzziness of an image, affected by perspective, will lead to changes in the visual appearance of the image. For example, looking at the same object from different angles produces different images;
  2. Images are often influenced by many natural factors, such as light, weather, reflection and movement;
  3. Some objects in the image may also be blocked by other objects, making the blocked objects difficult to detect or classify. As the degree of occlusion increases, the task of image processing (for example, image classification) can be very challenging.

To better explain the above problems, we assume that we need to develop a face detection system. The system should be robust enough to cope with changes in lighting or weather conditions; In addition, the system should be able to handle head movement — the user’s head can move a certain amount on each axis of the coordinate system (head up, head down, head down, the user can move a little bit closer to the camera or a little bit further away). While many face-detection algorithms perform well when the face is close to the front, the algorithm cannot detect it if the face is not front (for example, facing sideways to the camera). In addition, the algorithm may need to detect the face even when the user is wearing glasses or sunglasses (even if this creates occlusion in the eye area). To sum up, when developing a computer vision project, we must take all these factors into account. One way is to use a large number of test images to verify the algorithm. We can also classify images according to their different difficulty levels, so as to detect the weaknesses of the algorithm and improve the robustness of the algorithm.

Image processing flow

A complete image processing program can usually be divided into the following three steps:

  1. Read images, the acquisition of images can have many different sources (camera, video stream, disk, online resources), so the image reading may involve multiple functions so that the image can be read from different sources;
  2. Image processing, by applying image processing techniques to process the image in order to achieve the desired function (for example, detect the cat in the image);
  3. Display the results and render the finished image in a human-readable way (for example, draw bounding boxes in the image, and sometimes save them to disk).

In addition, the above step 2 image processing can be further divided into three different processing levels:

  1. Low-level processing (or what can be called preprocessing without causing ambiguity) usually takes one image as input and then outputs another image. The steps that can be applied in this step include but are not limited to the following methods: noise cancellation, image sharpening, illumination normalization, and perspective correction;
  2. Middle-level processing: it is to extract the main features of the pre-processed image (such as the image features obtained by DNN model) and output some form of image representation, which extracts the main features for further image processing.
  3. High level processing: accept the image features obtained from middle level processing and output the final result. For example, the output of processing could be a detected face.

Pixels, colors, channels, images, and color Spaces

There are many different color models for representing images, but the most common is the red, green, and blue (RGB) model. An RGB model is an additive color model in which the primary colors (in RGB, red R, green G, and blue B) can be mixed together to represent a wide range of colors. Each primary color (R, G, B) usually represents a channel, and its value range is integer values within [0, 255]. Thus, each channel has a total of 256 possible discrete values corresponding to the total number of bits used to represent the value of the color channel (28 = 2562^8 = 25628 = 256). In addition, since there are three different channels, an image represented using RGB model is called a 24-bit color depth image:

In the figure above, you can see the “additive color” attribute for RGB color space:

  1. Red plus green gives yellow
  2. Blue plus red gives magenta
  3. Blue plus green gives cyan
  4. The three primary colors add together to make white

Therefore, as mentioned above, in the RGB color model, a specific color can be represented by a combination of red, green and blue value components, and the pixel values are represented as RGB triples (R, G, B). A typical RGB color picker looks like this:

An image with a resolution of 800×1200 is a grid containing 800 columns and 1200 rows. Each grid is called a pixel, so it contains 800×1200= 960,000 pixels. It should be noted that the number of pixels in an image does not indicate its physical size (one pixel is not equal to one millimeter). Instead, the pixel size depends on the number of Pixels Per Inch (PPI) set for the image. The PPI of images is generally set in the range of [200-400]. The basic formula for calculating PPI is as follows:

  1. PPI= Width (pixels)/Image width (inches)
  2. PPI= Height (pixels)/Image height (inches)

For example, a 4×6 inch image with a resolution of 800×1200 would have a PPI of 200.

Image description

Images can be described as a 2 d function f (x, y) f (x, y) f (x, y), the (x, y) (x, y) (x, y) is a spatial coordinates, and f (x, y) f (x, y) f (x, y) is the image in point (x, y) (x, y) (x, y) point of brightness or grayscale or color values. In addition, when f(x,y)f(x, y)f(x,y) and (x,y)(x, y)(x,y) values are finite discrete quantities, the image is also called digital image, at which point:

  1. X ∈[0,h−1]x∈ [0, H-1]x∈[0, H-1]x∈[0, H −1] where HHH is the height of the image
  2. Y ∈[0, W −1]y∈ [0, W-1]y∈[0, W-1]y∈[0, W-1]y∈[0, W −1] where WWW is the width of the image
  3. F (x, y) ∈ [0, L – 1) f (x, y) ∈ [L – 0, 1] f (x, y) ∈ (0, L – 1), L = = 256 L, 256 L = 256 (for 8-bit grayscale image)

Color images can be represented in the same way, except that we need to define three functions to represent the red, green, and blue values. Each of these three separate functions follows the same formula as f(x,y)f(x, y)f(x,y) functions defined for grayscale images. We express the sub-indexes R, G and B of these three functions as fR(x,y)f_R(x, y)fR(x,y), fG(x,y)f_G(x, y)fG(x,y) and fB(x,y)f_B(x, y)fB(x,y) respectively. Similarly, a black and white image can be represented in the same form, requiring only one function to represent the image, and f(x,y)f(x, y)f(x,y) can only take two values. In general, 0 means black and 1 means white. The following figure shows three different types of images (color, grayscale, and black and white) :

A digital image can be considered an approximation of a real scene because f(x,y)f(x, y)f(x,y) values are finite discrete quantities. In addition, grayscale and black and white images have only one value per point, and color images require three functions per point corresponding to the red, green and blue components of the image.

Image file type

Although images can be processed in OpenCV as matrices of RGB triples (in the case of RGB image models), they are not necessarily created, stored, or transmitted in this format. There are many different file formats, such as GIF, PNG, bitmap, or JPEG, that use different forms of compression (lossless or lossy) to represent images more efficiently. The following table lists the file formats supported by OpenCV and their associated file extensions:

The file format File extension
Windows bitmaps And * *. BMP files. Dib
JPEG files *.jpeg, *.jpg and *.jpe
JPEG 2000 files *.jp2
Portable Network Graphics *.png
Portable image format *.pbm, *.pGM, and *.ppm
TIFF files *. TIFF and *. Tif

Applying lossless or lossy compression algorithms to images can obtain images that occupy less storage space than uncompressed images. Among them, in lossless compression algorithm, the image obtained is equivalent to the original image, that is, after the reverse compression process, the image obtained is completely equivalent to the original image (the same); In lossy compression algorithms, the resulting image is not the same as the original image, which means that some details in the image will be lost. In many lossy compression algorithms, the compression level can be adjusted.