Model description

LeNet performed very well on Mnist data sets, but not so well on larger real data sets, until AlexNet came out in 2012. The model won the ImageNet competition that year and sparked a wave of deep learning.

ImageNet is a large data set created by Li’s team for image processing, containing more than 14 million tagged images. Since 2010, ImageNet has held an annual image classification and object detection contest ————ILSVRC. There are 1000 different categories of images in the image classification competition, and each category has 200~1000 images from different sources.

AlexNet is mainly composed of 5 convolution layers and 3 full connection layers. The last full connection layer is used as the score of input image in 1000 categories through the results generated by Softmax.

Model structure

As shown in the figure, it is a structural diagram from AlexNet paper.

The input layer is the image of (227, 227, 3).

The first layer: convolution layer 1, the input is 224 × 224 × 3 image, the number of convolution kernel is 96, in this paper, two Gpus calculate 48 cores respectively; The size of the convolution kernel is 11 × 11 × 3; Stride = 4, stride represents the step length, pad = 0, indicates that the edge is not expanded;

What is the size of the convolution?

wide = (224 + 2 * padding – kernel_size) / stride + 1 = 54

height = (224 + 2 * padding – kernel_size) / stride + 1 = 54

dimention = 96

Then start Local Response Normalized,

Stride = 2, stride = 0

Finally, the output of the first convolution is 55 * 55 * 96

Layer 2: convolution layer 2, input is the feature map of the convolution of the previous layer, and the number of convolution is 256. The two Gpus in this paper have 128 convolution cores respectively. The size of convolution kernel is: 5 × 5 × 48; pad = 2, stride = 1; Then do LRN, and then max_pooling, pool_size = (3, 3), stride = 2;

Layer 3: convolution 3, input is the output of layer 2, number of convolution kernels is 384, kernel_size = (3 × 3 × 256), padding = 1, layer 3 does not do LRN and Pool

Layer 4: convolution 4, input is the output of layer 3, number of convolution kernels is 384, kernel_size = (3 × 3), padding = 1, same as layer 3, there is no LRN and Pool

Layer 5: convolution 5, input is the output of layer 4, number of convolution kernels is 256, kernel_size = (3 × 3), padding = 1. Max_pooling, pool_size = (3, 3), stride = 2;

Layers 6,7 and 8 are fully connected layers, the number of neurons in each layer is 4096, and the final output of Softmax is 1000, because as described above, the number of classification of ImageNet competition is 1000. RELU and Dropout are used in the fully connected layer.

Use Relu as the activation function

In order to speed up network training, AlexNet uses Relu as the activation function

Use a variety of methods to avoid overfitting

Data enhancement and dropout methods are used to avoid overfitting

Use multiple Gpus for training

In the structure diagram, there are two lines up and down, which means that multiple Gpus are used for training

Code implementation

The data set

The CIFAR-10 dataset consists of 60,000 32 × 32 color images in 10 categories with 6000 images per category. There are 50,000 training sets and 10,000 test sets.

The data set includes 10 categories such as aircraft and mobile phones:

Import torchvison.transforms as transforms train_tf = transforms.Compose([transforms.Resize((227,227))), Transforms. RandomHorizontalFlip (0.5), transforms. ToTensor (), transforms. The Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784]]) valid_tf = transforms.Com pose ([transforms. The Resize ((227227)), Transforms. ToTensor (), transforms the Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784])])Copy the code
import torch from torch.utils.data import DataLoader import torchvision.datasets as dsets import torchvision.transforms Transforms # torchVision. Transforms defines a series of transformations, including PILImage, NUMpy,Tensor, and then processing data. Batch_size = 64 # MNIST dataset train_dataset = dsets.CIFAR10(root = '/ml/cifar', Transform = train_tf, Test_dataset = dsets.CIFAR10(root = '/ml/cifar', Train = False, transform = valid_tf, # does not consider the use of any data preprocessing download = True) # # download images load data from the Internet train_loader = torch. Utils. Data. The DataLoader (= the dataset train_dataset, batch_size = batch_size, Shuffle disrupted test_loader = = True) # data torch. Utils. Data. The DataLoader (dataset = test_dataset, batch_size = batch_size, shuffle = False)Copy the code
From torch import nn device = torch. Device (' CUDa ') if torch. Cuda.is_available () else Torch. Device (' CPU ') import torch.nn as nn class Alexnet(nn.Module): def __init__(self,in_dim,n_class): Super().__init__() self.conv = nn.Sequential(nn.conv2d (in_dim,96,11,stride=4,padding=0), ReLU(True), nn.MaxPool2d(3, 2), nn.Conv2d(96, stride=1,padding=2), Nn. BatchNorm2d(256), nn.relu (True), nn.maxpool2d (3, 2), nn.conv2d (256,384,3,stride=1,padding=1), # nn.batchnorm2d (384), Nn. ReLU (True), nn Conv2d (384384, kernel_size = 3, stride = 1, padding = 1), # nn. BatchNorm2d (384), nn. ReLU (True), Nn. Conv2d (384256, kernel_size = 3, stride = 1, padding = 1), # nn. BatchNorm2d (256), nn. ReLU (True), nn. MaxPool2d (3, 2)) self.fc = nn.Sequential(nn.linear (9216,4096), nn.relu (True), nn.dropout (0.5), nn.linear (4096,4096), Dropout(0.5), nn.Linear(4096,n_class)) def forward(self, x): X = self.conv(x) x = x.view(x.size(0), -1) # batch (256,6,6) output = self.fc(x) return outputCopy the code
Device = torch. Device ('cuda') if torch. Cuda.is_available () else torch. Device (' CPU ') model = Alexnet(3,10) model.to(device)Copy the code
Import numpy as NP Learning_rate = 1e-2 # learning rate num_epoches = 20 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(net.parameters(), Lr = learning_rate,momentum=0.9)# use stochastic gradient descent model.train() # start training mode for epoch in range(num_epoches): print('current epoch = %d' % epoch) for i, (images, labels) in enumerate(train_loader): Images = images.to(device) labels = allelages.to (device) #print(images.shape) # [batch, channel, width, height) outputs = net (images) # data set into the network to calculate the loss before doing = criterion (outputs, Loss. Backward () # Loss Backpropagation optimizer.step() # Update parameter if I % 100 == 0: print('current loss = %.5f' % loss.item()) print('finished training')Copy the code
Eval () # enable evaluation mode for images, labels in test_loader: images = images.to(device) labels = labels.to(device) outputs = net(images) _, predicts = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicts == labels).cpu().sum() print(total) print('Accuracy = %.2f' % (100 * correct / total))Copy the code
The results were as follows: 10000 Accuracy = 83.83Copy the code

The resources

AlexNet paper: cca shut. Nips. Cc/paper / 4824 -…

Cifar data set: www.cs.toronto.edu/~kriz/cifar…