Make writing a habit together! This is my first day to participate in the “Gold Digging Day New Plan · April More text challenge”, click to see the details of the activity. Tisp: This is a multi-category article that takes you from problem introduction to problem solving to practice. All of these are my own learning and understanding process. If you are interested in the opinion article, please read it patiently

Multiple classification problem

In deep learning, there is a classic dataset, MNIST-dataset, which is a collection of handwritten digital photos and a classic collection of multi-classification.

This set is composed of 0 ~ 9 and 10 numbers. Our requirement is to analyze the probability that the input number corresponds to the real number, indicating that ten labels and ten categories are required.

To solve this problem is very simple, in the second classification model output to 10 each output corresponding to a number, each number so that we can get the corresponding probability value, each output do here are all sigmoid binary classification (i.e., is a 1 or 0), so as long as there is a output to 1, the output of the other than 1 are prescribed to 0, to judgment.

But this situation will be a problem, every sigmoid output is independent, when the output probability of a category is higher, other categories of probability will still be high, that is to say after the output of the probability of 1, 2 output probability will not be affected because the emergence of 1, it shows the sum of all of the output probability value greater than 1. So we want to generalize the output to meet the requirements of distribution properties:

1. Each category has a probability greater than zero

2. The sum of the probabilities of all categories is 1

It can be said that each output of a multi-classification problem needs to be competitive

Increase the SOFTmax Layer

In order to solve the above two problems, it is necessary to improve the neural network. Adding Softmax Layer in the final output Layer of the neural network can solve these two problems well

SOFTmax

Definition:

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.

Apply the Softmax function to the n-dimensional input tensors and rescale them so that the elements of the n-dimensional output tensors are in the range [0,1] and sum to 1. According to the definition, the output range of Softmax is within [0,1], and then the sum is 1. These two points can solve our problem well

Function:

Analysis of operation process and steps of SOFTmax function:

1. Perform exponential exponentiation with base E for each output result received from Linear Layer to ensure that all outputs are positive

2. Add all the outputs to exp and divide by the result of exp for each output. This ensures that the sum of all outputs is 1 and that each output is competitive

These two processes solve the first two problems perfectly!

What about the loss function of the classification?

Cross Entropy Loss Function

Loss function of dichotomies:

(# Speechless Nuggets can’t write formulas or I can’t)

The case of multiple classifications is an extension of dichotomies:

It’s just adding a sum to the dichotomies

Pytorch encapsulates Softmax and NLLLoss in the Cross Entropy Loss function

Code implementation:

import torch
y = torch.LongTensor([0]) # y here requires a tensor of long integer type
z = torch.Tensor([[0.99.0.1, -0.1]])
criterion = torch.nn.CrossEntropyLoss()
loss1 = criterion(z,y)
print(loss1)
Copy the code

The whole thing is wrapped up in the code, so the input data doesn’t need to be activated just to do linear processing

practice

MNIST data set is composed of many images with 28*28=784 pixels. These are gray scale images, so they are one-dimensional data

Original imageThe computer can’t understand the input image naturally, so we need to use TorchVision to translate the image mapping into N-dimensional Tensor data

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize(0.1307.0.3081,)])Copy the code

Data preparation and preprocessing

Pytorch provides an interface directly to the MNIST dataset, which can be invoked in code

The introduction of package:

import torch
Package imported for preparing the data set
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
from torch.utils.data **import** DataLoader
"' do rule ' '
import torch.nn.functional as f
Do regression.
import torch.optim as optim
Copy the code

Data preparation and preprocessing

1. Data preparation and processing
batch_size = 64
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize(0.1307.0.3081,)])# Preprocess images into tensors,
The first value is the mean value, the second is the standard deviation, and the two values are MNIST, the mean value and standard deviation of the data, which are calculated by predecessors.Train_datast = datasets. MNIST (root = '. / '" train "=True, download=True, transform=transform)  Load data from Torch
train_loader=DataLoader(train_datast, shuffle=True, batch_size=batch_size)   # instantiate the dataset

The test data set is defined here for testing
test_datast=datasets.MNIST(root='/', train=False, download=True, transform=transform)

test_loader=DataLoader(test_datast, shuffle=False, batch_size=batch_size)
Copy the code

Stop model

The model only uses a simple linear variation, and the activation function is selected Relu

2. Make a model
class Net(torch.nn.Module) :
    def __init__(self) :
        super(Net, self).__init__()
        self.l1 = torch.nn.Linear(784.512)Enter the total number of pixels for an image 28*28=784, MNIST
        self.l2 = torch.nn.Linear(512.256)
        self.l3 = torch.nn.Linear(256.128)
        self.l4 = torch.nn.Linear(128.64)
        self.l5 = torch.nn.Linear(64.10)
    def forward(self, x) :
        x = x.view(-1.784)
        x = f.relu(self.l1(x))
        x = f.relu(self.l2(x))
        x = f.relu(self.l3(x))
        x = f.relu(self.l4(x))
        **return** self.l5(x)   # The last layer does not have nonlinear activation because the SOFTmax function is wrapped in CrossEntropyLoss.
# CrossEntropyLoss wants linear data

model = Net()
Copy the code

Loss functions and optimizers

3. Make loss functions and optimizers.
criterion = torch.nn.CrossEntropyLoss()   # Cross entropy loss
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)   The purpose of #momentum impulse is to be able to flush through saddle points and be locally optimal for better optimization
Copy the code

Training and testing

4. Design training and Testing
def train(epoch) :
     "' training '
    running_loss = 0.0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data   # the data
        optimizer.zero_grad()   # Gradient zero clearing
        outputs = model(inputs)   # training
        loss = criterion(outputs, target)   # calculate loss
        loss.backward()   # Backpropagation
        optimizer.step()   # Optimized gradient
        running_loss += loss.item()
         # Cumulative loss, here we need to use item () to get data or build the calculation graph
        if batch_idx % 300= =299:
             Output loss every 300 times
            print( '[%d,%5d] loss:%.3f' % (epoch + 1,batch_idx + 1, running_loss / 300))
            running_loss = 0.0
def test() :
     "' test ' ' '
    correct = 0
    total = 0
    with torch.no_grad():   # Do not calculate the gradient, only the predicted value **\
    for data in test_loader:\
            images, labels = data\
            outputs = model(images)\
            _, pred = torch.max(outputs.data, dim=1) \# take the subscript (_,) and pred of Max for each row (dim=1)
total += labels.size(0)   The shape of label is [N,1].
correct += (pred == labels).sum().item()   # pred(prediction y) == Labels (truth y) equals 1, then count the number of equals
print( 'Accuracy on test set: %d %%' % (100 * correct / total))
Copy the code

Run:

if __name__ ==  '__main__' :
    for epoch in range(10):
        train(epoch)
        test()
Copy the code

Running results:

The accuracy rate reached 97%, 97% just qualified, the model needs to be improved

The focus of this paper is to understand multiple classification problems and solutions, learn the principle of SOTFmax function and CEL cross entropy, and finally realize the application, I hope to help you.

See here can help me point a small praise 👍