HOG: Histogram of Oriented Gradient
The HOG (Histogram of Oriented Gradient) feature is an image description method for human object detection proposed by Dalal at the CVPR conference in 2005. The feature is composed of the gradient direction Histogram of the local area of the image and has Angle invariance and illumination invariance.
The principle of
- The main idea of this feature is that the gradient direction of the local region in the image can represent the edge and shape features of the subgraph region. In practice, the image is divided into small cells, each of which computes a histogram of gradient direction (or edge direction). To satisfy light invariance, contrast normalization of the histogram is required by grouping cell units into larger blocks and normalizing all cell units within the blocks. The normalized block is called a HOG descriptor. The final feature vector is formed by combining HOG descriptors of all blocks in the detection window.
steps
- Image scaling, gamma correction and other pre-processing.
- The gradient value of each pixel is calculated to obtain the gradient map. In digital images, gradient refers to the difference between adjacent pixel points. For example, in the figure below, the point with pixel value 136 has a gradient of 139-133=6 in vertical direction and 139-134=5 in horizontal direction. According to these two values, the gradient intensity can be obtainedgAnd the gradient directionTheta..
- Calculate the gradient histogram in each 8×8 cell. As shown in the figure below, according to the gradient strength in the regiongAnd the gradient directionTheta.Statistical gradient histogram, 180° is divided into 9 intervals, willgAs the weight, the gradient direction of pixels in the region is counted by voting, and the direction distribution in the region is obtained.
- Each 16×16 block (i.e., sliding window processing for each 2×2 cell) is normalized to reduce the effects of light (pixel mean) and contrast (pixel variance).
- The HOG eigenvector is obtained. Each block is calculated to obtain feature vectors with the size of 4*9 interval =36, which can be applied to SVM or CNN classification by integrating the feature vectors according to the image length and width.
The pros and cons
- When a device such as a low-resolution computer screen is reproduced, the moorgrain will be generated during imaging due to the interference effect (as shown in the figure below). The moorgrain has obvious directional features, which can be reflected by the direction histogram of HOG statistics
- Mediocre for high-end screens/phones
usage
Use HOG method to extract the texture features of the image.
img = cv2.imdecode(np.fromfile(img_paths[i], dtype=np.uint8), -1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (207.194))
# Gamma transform
img = np.power(img, gamma).astype(np.uint8)
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)
hist = hog.compute(img, winStride, padding).reshape((-1)),Copy the code
visualization
LBP: histogram of Local Binary Pattern
LBP is a simple and effective feature extraction algorithm for texture classification. LBP operator was proposed by Ojala et al in 1996. The main paper is “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns “, Pami, Vol 24, No.7, July 2002. LBP is short for “local binary pattern”.
The principle of
- The original LBP operator is defined as in the window of 3 ∗ 3, the center pixel of the window is taken as the threshold value, and the gray value of the adjacent 8 pixels is compared with it. If the value of the surrounding pixel is greater than or equal to the value of the center pixel, the position of the pixel point is marked as 1; otherwise, it is 0. In this way, the 8 points in the 3 ∗ 3 neighborhood can be compared clockwise to produce 8-bit binary numbers (usually converted into decimal numbers, i.e., LBP codes, with a total of 256 kinds), that is, the LBP value of the central pixel of the window can be obtained, reflecting the texture information of the region.
- Improvement: Circular LBP operator:
In order to adapt to texture features of different scales and meet the requirements of gray scale and rotation invariability, Ojala et al improved the LBP operator by extending the 3×3 neighborhood to any neighborhood and replacing the square neighborhood with the circular neighborhood. The improved LBP operator allowed any number of pixels in the circular neighborhood with radius R.
To make the LBP rotation-invariant, the binary string is rotated. Assuming that the LBP feature obtained at the beginning is 10010000, it can be converted to 00001001 by rotating it in the clockwise direction, which corresponds to a minimum LBP eigenvalue. In this way, LBP is rotated unchanged.
- In the application, the most commonly used is to take the STATISTICAL histogram of LBP as the feature vector of the image. In order to consider the location information of features, the image is divided into several small regions, and histogram statistics are carried out in each small region, similar to blocks in HOG. Finally, the histograms of all regions are successively connected together as feature vectors to receive the next level of processing.
The pros and cons
- To a certain extent, the problem of light change is eliminated, with rotation invariance
- The texture feature has low dimension and fast computing speed
- When the illumination changes unevenly, the size relationship between each pixel is destroyed, and the corresponding LBP operator also changes.
- The LBP operator is more robust by introducing the definition of rotation invariance. But this also makes the LBP operator lose the direction information.
- Remake classification is better than HOG effect
usage
from skimage.feature import local_binary_pattern
import numpy as np
import cv2
img = cv2.imdecode(np.fromfile(img_paths[i], dtype=np.uint8), -1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# img = cv2.resize(img, (512, 512))
img = img / 255
# Gamma transform
img = np.power(img, gamma)
tain_imgs.append(img)
lbp = local_binary_pattern(img, n_point, radius, 'uniform')
hist = lbp.flatten()
Copy the code
visualization
About us
- S300 Cloud is an excellent domestic independent third-party SaaS service provider of auto trading and finance, which is based on artificial intelligence and takes the standardization of auto transaction pricing and auto finance risk control as the core product.
- S300 Cloud AI team provides high-precision AI identification services for industry users based on industry-leading machine learning technology and massive automotive industry data. Team the main research areas include image classification, semantic segmentation, target detection, recommendation algorithm, natural language processing, etc., based on the key technology of field developed a classification of cars, all kinds of card of card of structured recognition, instrument panel, mileage, etc, and according to the needs of industry users continuous optimization and improvement.
- S300 Cloud AI team is a diversified team actively expanding and advancing. Internship/formal positions are continuously open. Whether you are a student majoring in artificial intelligence or an experienced algorithmic expert, you can get the opportunity to display your talents here
- Website: www.sanbaiyun.com/ Resume: [email protected], please indicate from 😁