Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~
Author: Zheng Shanyou background development engineer of Tencent MIG
In the days before CNN and more advanced neural networks, the naive idea was to use multi-layer perceptron (MLP) for image classification recognition. But the harsh reality is that MLPS don’t do this very well. This paper tries to use MLP for image classification and recognition as a thought guide, which is actually the introduction of the next CNN notes. Documentation and code for this article, portal: Github project address
1. Using MLP to do image classification recognition?
- In the days before CNN and more advanced neural networks, the naive idea was to use a multilayer perceptron (MLP) for image classification recognition, which was fine
- As a continuation of the previous note taking study, and the next CNN article, the use of MLP for image classification is a good example of overuse. Through this example, from the train of thought leads to a series of questions, I do not keep in suspense, ask yourself answer, namely:
- Can MLP do image classification recognition? — > The answer is yes, in the last article we are fitting nonlinear classification functions, here is fitting image features, mathematical essence no difference.
- How effective is MLP to do this? < p style = “max-width: 100%; clear: both;
- MLP works just fine in this area. Are there any drawbacks? — > There are defects, which are described in more detail below.
- Is there a better solution? “> < p style =” margin-bottom: 0px; margin-bottom: 0px; But in a time when none of these things existed, you invented it, and that was 666.
Two. Get in the car first
1. The data source
- Of course, the data source is the picture, but it is processed by data, using H5 files. H5 files simply say that the data is indexed to solidify, quite simple not to say, degree once – > H5PY introductory explanation
- We have three H5 files with non-repetitive picture data, which are:
- Train_catvnoncat.h5 (used for training models, 209 pictures in total, including cats and non-cats, size 64*64 pixels)
- Test_catvnoncat.h5 (used to test model accuracy, total 50 images, including cats and non-cats, size 64*64 pixels)
- My_cat_misu.h5 (for fun, a photo of my cat owner, size 64*64 pixels)
2. Data structure
- For example, train_catvnoncat.h5, this file has two indexes:
- Train_set_x: This is an array, because there are 209 images, so the array length is 209. The elements in the array are a 64 by 64 by 3 matrix. 64*64 is the pixel size of the image. What the hell is 3? Remember that this is a color image, and 3 is the value of the RGB color channels.
- Train_set_y: Image label array, the length is also 209, which is the label of 209 pictures. When the subscript value of the corresponding array is 1, it means that the picture is cat, while 0 means that it is not.
- Similarly, test_catvnoncat.h5 has test_set_x and test_set_y; There are mycat_set_x and mycat_set_y in my_cat_misu.h5
3. Tell you how to make h5 file of pictures, which will be very useful for model training such as CNN in the future
- Take my master as an example:
- The original:
- Of course, you can also write code to do image processing, I am lazy, give you to achieve:
- Python code using the H5PY library:
def save_imgs_to_h5file(h5_fname, x_label, y_label, img_paths_list, img_label_list): Data_imgs = np.random. Rand (len(img_paths_list), 64, 64, 3).astype('int') label_IMgs = Np.random. Rand (len(img_paths_list), 1).astype('int') # plt.imread Then we store the n*n*3 matrix for I in range(len(img_paths_list)): Data_imgs [I] = np.array(plt.imread(img_paths_list[I])) label_imgs[I] = Np.array (img_label_list[I]) Save to file according to the specified index label, F = h5py.File(H5_fNAME, 'W ') f.cliate_dataset (x_label, data=data_imgs) f.cliate_dataset (y_label, Data =label_imgs) f.lose () return data_imgs, label_imgs 0 represents not save_imgs_to_h5file (' datasets/my_cat_misu. H5 ', 'mycat_set_x', 'mycat_set_y' [' misu. JPG], [1])Copy the code
4. Take a look at my data source
- Collection of images used for training, 209:
- A collection of 50 images used to verify the accuracy of the model
- Used to play, master photo cheat, 1:
Three. Off we go
1. How to design the model:
- Input layer: Our picture is 64*64 pixel size, so counting the DATA of RGB three channels, we pull the 3D matrix into noodles 64*64*3 = 12288. So the length of our input layer is 12288.
- Hidden layers: With multiple hidden layers, you can try different structures on your own. Here I use three hidden layers, and the number of hidden layers is 20, 7, and 5
- Output layer: Our goal is to judge whether a certain picture is just a cat, so one neuron in the output layer is considered to be a cat if the output probability is greater than 0.5, and not if the output probability is less than or equal to 0.5.
Insert: One wonders if the first hidden layer has the same number of neurons as the input layer. In theory it would be better, but this involves a flaw in the MLP, because in the fully connected case, the weight of the first layer parameter w is 1228 square, which is about 150 million. What if the picture is bigger? The parameters will expand exponentially. Imagine the consequences.
2. How to train the model
- In other words, throw 209 images into the neural network, complete an iteration, then train 10,000 times, you can try to observe the effect of different generations.
3. How to measure the model’s accuracy
- One way to measure how well you’re training your model is to divide it up into different sets, one for training and one for validation, as mentioned by Andrew Ng. So we trained 209 images and ended up testing the model with 50.
- For fun, you can use different pictures to classify and recognize them through the model.
Four. Old rule: Dump the code
Let’s explain the code flow:
- The NeuralNetwork used in the code is the code of my last note, which implements BP NeuralNetwork. Import can be used directly.
- What the code does is:
- Load image data from an H5 file
- Display the original image and save it as an image file
- Train the neural network model
- Verify model accuracy
- The recognition result is annotated to the original picture and also saved as a picture file
#coding:utf-8
import h5py
import matplotlib.font_manager as fm
import matplotlib.pyplot as plt
import numpy as np
from NeuralNetwork import *
font = fm.FontProperties(fname='/System/Library/Fonts/STHeiti Light.ttc')
def load_Cat_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
mycat_dataset = h5py.File('datasets/my_cat_misu.h5', "r")
mycat_set_x_orig = np.array(mycat_dataset["mycat_set_x"][:])
mycat_set_y_orig = np.array(mycat_dataset["mycat_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
mycat_set_y_orig = mycat_set_y_orig.reshape((1, mycat_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, mycat_set_x_orig, mycat_set_y_orig,classes
def predict_by_modle(x, y, nn):
m = x.shape[1]
p = np.zeros((1,m))
output, caches = nn.forward_propagation(x)
for i in range(0, output.shape[1]):
if output[0,i] > 0.5:
p[0,i] = 1
else:
p[0,i] = 0
# 预测出来的结果和期望的结果比对,看看准确率多少:
# 比如100张预测图片里有50张猫的图片,只识别出40张,那么识别率就是80%
print(u"识别率: " + str(np.sum((p == y)/float(m))))
return np.array(p[0], dtype=np.int), (p==y)[0], np.sum((p == y)/float(m))*100
def save_imgs_to_h5file(h5_fname, x_label, y_label, img_paths_list, img_label_list):
data_imgs = np.random.rand(len(img_paths_list), 64, 64, 3).astype('int')
label_imgs = np.random.rand(len(img_paths_list), 1).astype('int')
for i in range(len(img_paths_list)):
data_imgs[i] = np.array(plt.imread(img_paths_list[i]))
label_imgs[i] = np.array(img_label_list[i])
f = h5py.File(h5_fname, 'w')
f.create_dataset(x_label, data=data_imgs)
f.create_dataset(y_label, data=label_imgs)
f.close()
return data_imgs, label_imgs
if __name__ == "__main__":
# 图片label为1代表这是一张喵星人的图片,0代表不是
#save_imgs_to_h5file('datasets/my_cat_misu.h5', 'mycat_set_x', 'mycat_set_y', ['misu.jpg'],[1])
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, mycat_set_x_orig, mycat_set_y_orig, classes = load_Cat_dataset()
train_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
mycat_x_flatten = mycat_set_x_orig.reshape(mycat_set_x_orig.shape[0], -1).T
train_set_x = train_x_flatten / 255.
test_set_x = test_x_flatten / 255.
mycat_set_x = mycat_x_flatten / 255.
print(u"训练图片数量: %d" % len(train_set_x_orig))
print(u"测试图片数量: %d" % len(test_set_x_orig))
plt.figure(figsize=(10, 20))
plt.subplots_adjust(wspace=0,hspace=0.15)
for i in range(len(train_set_x_orig)):
plt.subplot(21,10, i+1)
plt.imshow(train_set_x_orig[i],interpolation='none',cmap='Reds_r',vmin=0.6,vmax=.9)
plt.xticks([])
plt.yticks([])
plt.savefig("cat_pics_train.png")
plt.show()
plt.figure(figsize=(8, 8))
plt.subplots_adjust(wspace=0, hspace=0.1)
for i in range(len(test_set_x_orig)):
ax = plt.subplot(8, 8, i + 1)
im = ax.imshow(test_set_x_orig[i], interpolation='none', cmap='Reds_r', vmin=0.6, vmax=.9)
plt.xticks([])
plt.yticks([])
plt.savefig("cat_pics_test.png")
plt.show()
plt.figure(figsize=(2, 2))
plt.subplots_adjust(wspace=0, hspace=0)
for i in range(len(mycat_set_x_orig)):
ax = plt.subplot(1, 1, i + 1)
im = ax.imshow(mycat_set_x_orig[i], interpolation='none', cmap='Reds_r', vmin=0.6, vmax=.9)
plt.xticks([])
plt.yticks([])
plt.savefig("cat_pics_my.png")
plt.show()
# 用训练图片集训练模型
layers_dims = [12288, 20, 7, 5, 1]
nn = NeuralNetwork(layers_dims, True)
nn.set_xy(train_set_x, train_set_y_orig)
nn.set_num_iterations(10000)
nn.set_learning_rate(0.0075)
nn.training_modle()
# 结果展示说明:
# 【识别正确】:
# 1.原图是猫,识别为猫 --> 原图显示
# 2.原图不是猫,识别为不是猫 --> 降低显示亮度
# 【识别错误】:
# 1.原图是猫,但是识别为不是猫 --> 标红显示
# 2.原图不是猫, 但是识别成猫 --> 标红显示
# 训练用的图片走一遍模型,观察其识别率
plt.figure(figsize=(10, 20))
plt.subplots_adjust(wspace=0, hspace=0.15)
pred_train, true, accuracy = predict_by_modle(train_set_x, train_set_y_orig, nn)
for i in range(len(train_set_x_orig)):
ax = plt.subplot(21, 10, i + 1)
x_data = train_set_x_orig[i]
if pred_train[i] == 0 and train_set_y_orig[0][i] == 0:
x_data = x_data/5
if true[i] == False:
x_data[:, :, 0] = x_data[:, :, 0] + (255 - x_data[:, :, 0])
im = plt.imshow(x_data,interpolation='none',cmap='Reds_r',vmin=0.6,vmax=.9)
plt.xticks([])
plt.yticks([])
plt.suptitle(u"Num Of Pictrues: %d\n Accuracy: %.2f%%" % (len(train_set_x_orig), accuracy), y=0.92, fontsize=20)
plt.savefig("cat_pics_train_predict.png")
plt.show()
# 不属于训练图片集合的测试图片,走一遍模型,观察其识别率
plt.figure(figsize=(8, 8))
plt.subplots_adjust(wspace=0, hspace=0.1)
pred_test, true, accuracy = predict_by_modle(test_set_x, test_set_y_orig, nn)
for i in range(len(test_set_x_orig)):
ax = plt.subplot(8, 8, i + 1)
x_data = test_set_x_orig[i]
if pred_test[i] == 0 and test_set_y_orig[0][i] == 0:
x_data = x_data/5
if true[i] == False:
x_data[:, :, 0] = x_data[:, :, 0] + (255 - x_data[:, :, 0])
im = ax.imshow(x_data, interpolation='none', cmap='Reds_r', vmin=0.6, vmax=.9)
plt.xticks([])
plt.yticks([])
plt.suptitle(u"Num Of Pictrues: %d\n Accuracy: %.2f%%" % (len(mycat_set_x_orig), accuracy), fontsize=20)
plt.savefig("cat_pics_test_predict.png")
plt.show()
# 用我家主子的照骗,走一遍模型,观察其识别率,因为只有一张图片,所以识别率要么 100% 要么 0%
plt.figure(figsize=(2, 2.6))
plt.subplots_adjust(wspace=0, hspace=0.1)
pred_mycat, true, accuracy = predict_by_modle(mycat_set_x, mycat_set_y_orig, nn)
for i in range(len(mycat_set_x_orig)):
ax = plt.subplot(1, 1, i+1)
x_data = mycat_set_x_orig[i]
if pred_mycat[i] == 0 and mycat_set_y_orig[0][i] == 0:
x_data = x_data/5
if true[i] == False:
x_data[:, :, 0] = x_data[:, :, 0] + (255 - x_data[:, :, 0])
im = ax.imshow(x_data, interpolation='none', cmap='Reds_r', vmin=0.6, vmax=.9)
plt.xticks([])
plt.yticks([])
if pred_mycat[i] == 1:
plt.suptitle(u"我:'我主子是喵星人吗?'\nA I :'是滴'", fontproperties = font)
else:
plt.suptitle(u"我:'我主子是喵星人吗?'\nA I :'唔系~唔系~'", fontproperties = font)
plt.savefig("cat_pics_my_predict.png")
plt.show()
Copy the code
Conclusion five.
1. The output results of the neural network model are marked on the picture and displayed as follows:
Results show instructions:
【 Identify correctly 】 :
- Original image is cat, identify as cat – > original image display
- The original image is not a cat, identify as not a cat – > reduce display brightness
【 Identification error 】 :
- The original image is a cat, but it is not identified as a cat – > marked in red
- The original image is not a cat, but it is identified as a cat – > marked in red
The title of the image will indicate Accuracy, which is calculated as: number of correct images/total number of images.
2. After the model training was completed, 209 images used in the training were identified with the trained model. The observation results showed that the model with 1W iterations had 100% accuracy in identifying the training atlas:
3. After the model training, use the test atlas to identify the trained model once, and observe the results: It can be seen that the model with 1W iterations can identify the training atlas with only 78% accuracy:
4. See if the model can identify my owner as a cat, and it looks like it does:
Further analysis of the results leads to a series of problems
- Raises a question: why is it that only 78% of models are recognized when validated with test atlases? After I tried to change the structure design of neural network and adjust parameters, I still could not improve the recognition rate. Why?
- A partial solution:
- Maybe MY skills are limited and I’m not in the right position? Handsome posture is not omnipotent, we should analyze from a deeper level of principle.
- Some people say that your training data is low, there seems to be some truth. In fact, it is a good idea to input more image features into the model, such as rotation, image content zoom in and out, move position, etc. However, Andrew Ng also said that the pursuit of training data collection is a one-way street. With the same training data set, is there a better way? Which brings us to the next question.
- Get to the bottom of it: To understand why MLP recognition is hard to achieve, forget about network architecture, tuning, and training data. We should find fault with MLP. We are aiming to improve the classification and recognition of images. Then, when using MLP to achieve this goal, whether it has its own defects, leading to difficulties in achieving this goal. If we solve these difficulties, we will find a way to solve the problem.
- MLP in doing image classification recognition defects:
- Neuron is a neural network formed in the way of full connection. In the case of full connection, assuming that the image is 1K * 1K pixel size, the number of hidden layer and the size of input layer are the same, and RGB color channel is not considered, under a single channel, the number of weight parameters w will be:
$(10 ^ 3 ∗ 10 ^ 3) 10 ^ ^ 2 = {12} $
= 100 billion (if I count 0 correctly). If the picture is larger, the parameters swell to unimaginable levels, resulting in direct negative effects:- Too many parameters, huge calculation
- In the case of full connection, too deep network is easy to cause gradient disappearance and the model is difficult to train
- When MLP is fully connected, image deformation recognition cannot be achieved. For example, write 8. Everyone has different writing habits. Some people write straight, but some people write crooked, the top part is small, the bottom part is big, and so on. At this point, the disadvantage of MLP is that it can’t recognize the same image if it is rotated or slightly shifted. You can add more features to the model, but this is not a solution to the problem per se, but rather an optimization of training.
- Neuron is a neural network formed in the way of full connection. In the case of full connection, assuming that the image is 1K * 1K pixel size, the number of hidden layer and the size of input layer are the same, and RGB color channel is not considered, under a single channel, the number of weight parameters w will be:
Seven. Sum up the problem to be solved, not far from the next pit
Some of the problems to be solved have been listed above. Here’s a summary:
- We need to solve the problem of large computation caused by parameter inflation
- After optimizing the number of parameters, how to extract more features while keeping the same training data set
- When the input has a certain rotation, translation and contraction, it can still be correctly identified
The solution, as we all know, is CNN and many more advanced neural network models. This article serves as an introductory article and CNN’s fuse. In your hands, make the first hidden layer as big as the input layer, i.e. layers_dims = [12288, 12288, 20, 7, 5, 1]. It’s just a small 64 by 64 image, and it’s too slow for me and my stupid laptop. That’s one of the reasons why the gods invented CNN.
Question and answer
TensorFlow: How to do image recognition?
reading
Face recognition technology development and practical scheme design
Image analysis and all that ︱AI is here
SURF, a basic algorithm for image recognition
Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1150162?fromSource=waitui
Welcome toTencent Cloud + communityOr pay attention to the wechat public account (QcloudCommunity), the first time to get more massive technical practice dry goods oh ~