I suddenly want to study deep learning, which is very popular recently. However, as a software engineer, the main goal of this study is not to do algorithm research, but to apply it to engineering. Therefore, some details and mathematical principles will not be studied and analyzed in depth in this paper.

This paper mainly introduces the typical application scenario of deep learning — image classification as the breakthrough point.

primers

It is well known that a straight line can divide a plane into two halves. Let’s say I have a line Ax plus by plus c is equal to 0, and I give you a point x1, y1, and I think you can easily figure out which side of the line that point is on.

So now let’s switch the problem. I give you a bunch of points, and I tell you which side of the line these points belong to, and now I want you to figure out the line?

This is a very interesting problem, and I’m afraid it’s hard to figure out how to solve it. The usual question is formula + parameter => answer, this question is the opposite, becomes parameter + answer => formula. This problem is actually a simplified version of deep learning.

Handwriting recognition

You may be confused by the above examples. What? This is deep learning. Why don’t I understand anything? Take your time, and let me give you a more intuitive example — handwriting recognition.

As shown in the figure above, our image is made up of N pixels. For the sake of simplicity, we only consider identifying the 10 digits from 0 to 9, and simply consider the white pixel as 1 and the black pixel as 0.

The simplest idea is that each pixel has a weight, and when the weights of all white pixels add up, we can figure out which number they correspond to.

Then we can write 10 equations for the final identification results from 0 to 9.


{ 0 : a 1 x 1 + a 2 x 2 + . . . a n x n = r e s u l t 0 ( The figure is 0 , it is 1 , or for 0 ) 1 : b 1 x 1 + b 2 x 2 + . . . b n x n = r e s u l t 1 ( The figure is 1 , it is 1 , or for 0 ) . . . 9 : c 1 x 1 + c 2 x 2 + . . . c n x n = r e s u l t 9 ( The figure is 9 , it is 1 , or for 0 ) \begin{cases} 0: a_{1}x_{1}+a_{2}x_{2}+… A_ x_ {n} {n} = result_0 (picture number is 0, 1, otherwise 0) \ \ 1: b_ {1} x_ b_ {1} + {2} x_ {2} +… B_ {n}x_{n} = result_1; b_{n}x_{n} = result_1; \\ 9: c_{1}x_{1}+c_{2}x_{2}+… C_ {n}x_{n} = result_9 \\ \end{cases}

So for each handwriting, we know x1,x2… ,xnx_1, x_2, … , x_nx1,x2,… Xn and resultnResult_nresultn, the ultimate goal is to figure out all the coefficients. And you can see that this problem and the line problem above are in the same form. That is, given the parameters and results find the coefficients.

Ps: In fact, the above 10 equations are converted into graphs, which is a classic network model in deep learning — fully connected network. The diagram below:

summary

From the above introduction, I think everyone can easily think of the basic flow of deep learning:

1. Build the formula (build the network) 2. Use the training set to work out the coefficients of each formula 3. Test the accuracy of these coefficients using test setsCopy the code

But how to solve the coefficient involves gradient descent and back propagation algorithms. This article does not do a detailed explanation, you can baidu. This article uses the off-the-shelf interface provided in PyTorch directly.

AI framework – Used by PyTorch

For a quick start, the obvious choice of language was Python, and the AI framework was pyTorch. In PyTorch, our network training process is basically as follows:

  1. Get data (training set and test set)
  2. To build the network
  3. Training with training sets (find coefficients)
  4. Use test sets for testing

The installation

Pytorch has installation commands on its website, but I won’t go into details here. Of course, pay attention to the CPU and GPU(CUDA) versions when installing.

To get the data

In the field of AI, there are many public data sets, such as MNIST — handwritten numbers, CIFAR10 and CIFAR100 — image recognition, ImageNet — image recognition, etc. Pytorch provides a ready-made API for loading these public datasets (pyTorch also supports custom datasets, of course). For the sake of simplicity, we will take CIFAR10 as an example to explain. There are a total of 10 kinds of pictures in CIFAR10, which are planes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks.

Next comes the code for PyTorch to load the data

import torchvision
import torchvision.transforms as transforms

BATCH_SIZE = 8    # BATCH_SIZE Small for quick execution

# Define image conversion rules
transform_rule = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5.0.5.0.5), (0.5.0.5.0.5)))Load the training set
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_rule)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True)

Load the test set
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_rule)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, drop_last=True)
Copy the code

Can see pytorch directly provides torchvision. Datasets. CIFAR10 method is used to download & loading CIFAR10 data set.

So let’s look at the top, there’s a picture conversion rule. For PyTorch, everything is based on the tensor type, which is simply a multidimensional array that you can run on your GPU. So if we want to train the data, we have to download the image and put it into tensor format, so we need to call ToSensor(). The next step is Normalize(), which is used for data preprocessing, but we’ll ignore it here.

After that, there is a DataLoader for loading data (PS: normally, we don’t put all the data in the training set into the neural network for training, but train in batches, after all, the data is too large for the machine).

To build the network

Now that we have the data, we need to build the network. In PyTorch, to build the network, we simply inherit torch.nn.Module and implement the forward() method. As follows:

import torch.nn as nn

class Net(nn.Module) :
    def __init__(self) :
        super(Net, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(3072.10),def forward(self, x) :
        x = x.view(BATCH_SIZE, -1)
        x = self.model(x)
        return x
Copy the code

First, x = X. view(BATCH_SIZE, -1) is used to flatten the picture into one dimension, because the network we use here is a fully connected network, and the input and output are one-dimensional.

Linear(x, y) represents the full connection layer, where x represents the number of inputs and Y represents the number of outputs.

Similarly we can deepen the layers as follows:

self.model = nn.Sequential(
    nn.Linear(3072.1024),
    nn.Linear(1024.10),Copy the code

So we’ve built a simple neural network.

Training data set

Next, it’s the code to train the data set.

Set the execution device, CPU/GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

net = Net().to(device)
criterion = nn.CrossEntropyLoss().to(device)   # Define loss function, here use CrossEntropyLoss, other algorithms please search by yourself
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)  # Define optimizer, here use SGD, other algorithms please search by yourself

for epoch in range(10) :# train 10 rounds
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data    # input indicates the image, and label indicates the classification of the image
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()    Reset the network gradient
        outputs = net(inputs)    Implement networks with training data
        loss = criterion(outputs, labels)   # Calculate settlement deviation
        loss.backward()    # Calculate the gradient of backpropagation
        optimizer.step()   Adjust the coefficient according to the gradient
Copy the code

This code is the template code for performing network training. It first returns the gradient to 0, then executes the network, calculates the gradient through back propagation, and finally adjusts the coefficient. As for the specific function and principle of each algorithm above, this paper does not explain in detail.

With just a few lines of code, the training is complete.

Test data set

Finally, we take the trained model and test the accuracy of the model.

correct = 0
total = 0
with torch.no_grad():   No longer need to calculate gradients when testing to prevent performance waste
    for data in test_loader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)   # Call the trained model recognition test picture
        _, predicted = torch.max(outputs.data, 1)  # The network will finally output the corresponding probability of 10 categories, and take the highest probability
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
Copy the code

Basically template code, I won’t go into detail.

summary

From the above code we can basically understand the construction of a neural network and its training test process. Of course, there are still many concepts left unmentioned. For example, the network layer may have convolution layer, activation layer and so on in addition to the full connection layer. For more, check out the B site for tutorials and search for the keywords – Getting Started with PyTorch.

Simple engineering applications

From a software engineer’s point of view, it’s obvious that we don’t need to actually debug the network model ourselves, our goal is to simply call the off-the-shelf network model and use it in production. Pytorch also provides many well-known network models, such as VGG, GoogLeNet, ResNet, etc. In this article, using ResNet(Residual Network) as an example, one line of code will do:

resnet = torchvision.models.resnet152(pretrained=True, progress=True)
Copy the code

So what we’re going to do is actually call this network to recognize the image,

import torch
import torchvision.models
from PIL import Image
from torchvision import transforms

# ResNet is based on the ImageNet dataset, imagenet_classes here. TXT storage is ImageNet label, see https://blog.csdn.net/LegenDavid/article/details/73335578
with open('./imagenet_classes.txt') as f:
    labels = [line.strip() for line in f.readlines()]

# Read images & preprocessing (the various conversion parameters here are also found on the Internet, not too detailed)
image = Image.open('./img.png').convert('RGB')
preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485.0.456.0.406], std=[0.229.0.224.0.225])
])
image_preprocessed = preprocess(image)
batch_image_preprocessed = torch.unsqueeze(image_preprocessed, 0)

Execute the network and print the result
resnet = torchvision.models.resnet152(pretrained=True, progress=True)
resnet.eval()
out = resnet(batch_image_preprocessed)
_, index = torch.max(out, 1)
print(labels[index])
Copy the code

As you can see, the actual call is basically very simple, just out = resnet(batch_image_preprocessed). If you want to use it in production, there are many ways. The simplest is to layer a Web framework like Django on top of it.

reference

[1] PyTorch — How to Load & Predict Using Resnet Model