Convolution kernel, convolution layer
Input = = = > figure
Step 1: We enter an image
Step 2: The image is divided into regions for the machine
Step 3: Each region will correspond to the corresponding parameters in the convolution kernel to extract features
Output ===> region and corresponding features
Pooling layer
Input ===> Region and corresponding features
Step 1: In order to improve the quality of the feature, the location information of the region corresponding to each feature was sacrificed (example: I don’t know where the cat is in the picture anymore, I only know that there are cats that look like this).
Step 2: Downsampling, each area of the image is reduced to improve the resolution and quality of the image
Output ===> A region whose position is not known but which is more easily distinguished by the machine and its corresponding features
The connection layer
Input ===> Regions and corresponding features whose location is unknown but which are easier to machine distinguish
Step 1: Trained neural network models compare these location-unknown features to test images (here’s the cat’s face, here’s the cat’s legs, here’s the cat’s tail…).
Step 2: The comparison results of each section are combined to produce the result —– this is the cat
Output ===> This is the cat
End-to-end training
Traditional training methods, the machine can not clearly extract and detect features, so it needs manual intervention, manual to identify features, before input back to the machine.
End-to-end training is
Data ------> model -------> resultsCopy the code
(No artificial feature extraction, and help the machine to set features, let the machine learn features for detection, extraction, matching, from data input to data output uninterrupted, all in the model)