Model description
VGGNet is a deep convolutional neural network developed by Oxford University computer Vision group and Google DeepMind researchers. It explores the relationship between the depth of convolutional neural network and its performance. By repeatedly stacking 33 small convolutional kernels and 22 largest pooling layers, convolutional neural network with 16~19 layers of depth is successfully constructed. VGGNet was the runner-up in ILSVRC’s 2014 competition and the champion in the positioning event, with an error rate of 7.5% on the top5. So far, VGGNet is still used to extract features from images. The core idea of VGGNet is to use a small range of convolution kernel to increase the network depth. The network structure of VGGNet is clear and concise. Although it has been published for many years, it still has a very wide range of applications.
Common VGGNet types are VGG16 and VGG19.
VGG16 has 13 convolution layers with cores of (3, 3), 5 maximum pooling layer cores and 3 fully connected layers. VGG16 has 16 convolution layers with (3, 3) core size, 5 maximum pooling layer cores and 3 fully connected layers.
Model structure
The following table is a screenshot from the VGGNet paper. Let’s focus on columns D and E.
They correspond to VGG16 and VGG19 models respectively.
Innovation points of model
All convolutional layers of VGGNet are of size (3, 3). Because three consecutive convolution kernels of (3, 3) have the same effect as that of (7, 7).
A convolution kernel of (7, 7) has 7 x 7 = 49 parameters.
The number of convolution kernel parameters for three consecutive groups (3, 3) is 3 x 3 x 3 = 27.
Reduce the number of parameters and speed up the training.
The data set
The network models to be run in VGG16 and VGG19 use this data set.
The CIFAR-10 dataset consists of 60,000 32 × 32 color images in 10 categories with 6000 images per category. There are 50,000 training sets and 10,000 test sets.
The data set includes 10 categories such as aircraft and mobile phones:
VGG16 code implementation
Import torchvison.transforms as transforms train_tf = transforms.Compose([transforms.Resize((224,224))), Transforms. RandomHorizontalFlip (0.5), transforms. ToTensor (), transforms. The Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784]]) valid_tf = transforms.Com pose ([transforms. The Resize ((224224)), Transforms. ToTensor (), transforms the Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784])])Copy the code
import torch from torch.utils.data import DataLoader import torchvision.datasets as dsets import torchvision.transforms Transforms # torchVision. Transforms defines a series of transformations, including PILImage, NUMpy,Tensor, and then processing data. Batch_size = 64 # MNIST dataset train_dataset = dsets.CIFAR10(root = '/ml/cifar', Transform = train_tf, Test_dataset = dsets.CIFAR10(root = '/ml/cifar', Train = False, transform = valid_tf, # does not consider the use of any data preprocessing download = True) # # download images load data from the Internet train_loader = torch. Utils. Data. The DataLoader (= the dataset train_dataset, batch_size = batch_size, Shuffle disrupted test_loader = = True) # data torch. Utils. Data. The DataLoader (dataset = test_dataset, batch_size = 12, shuffle = False)Copy the code
from torch import nn class VGG16(nn.Module): def __init__(self,in_dim,num_classes): super().__init__() self.features = nn.Sequential( nn.Conv2d(in_dim,64,kernel_size=3,padding=1), nn.BatchNorm2d(64), Nn. ReLU (inplace = True), nn. Conv2d (64 (kernel_size = 3, padding = 1), nn. BatchNorm2d (64), nn. ReLU (inplace = True), MaxPool2d(kernel_size=2,stride=2), # 224/2 = 112 nn.Conv2d(64,128,kernel_size=3,padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), # 112 /2 = 56 nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), # 56 /2 =28 nn.Conv2d(256, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), #28 /2 = 14 nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), Nn.MaxPool2d(kernel_size=2,stride=2) # 14-2/2 + 1 = 7) self. Classifier = nn.Sequential(nn. Nn.relu (inplace=True), nn.dropout (0.5), nn.linear (4096,4096), nn.relu (True), nn.dropout (0.5), nn.Linear(4096,num_classes) ) def forward(self, x): x = self.features(x) x = x.view(x.size(0),-1) x = self.classifier(x) return xCopy the code
Device = torch. Device ('cuda') if torch. Cuda.is_available () else torch. Device (' CPU ') net = VGG16(3,10) net.to(device)Copy the code
Import numpy as NP Learning_rate = 1e-2 # learning rate num_epoches = 20 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(net.parameters(), Lr = learning_rate,momentum=0.9)# use stochastic gradient descent net.train() # start training mode for epoch in range(num_epoches): print('current epoch = %d' % epoch) for i, (images, labels) in enumerate(train_loader): Images = images.to(device) labels = allelages.to (device) #print(images.shape) # [batch, channel, width, height) outputs = net (images) # data set into the network to calculate the loss before doing = criterion (outputs, Loss. Backward () # Loss Backpropagation optimizer.step() # Update parameter if I % 100 == 0: print('current loss = %.5f' % loss.item()) print('finished training')Copy the code
Eval () # enable evaluation mode for images, labels in test_loader: images = images.to(device) labels = labels.to(device) outputs = net(images) _, predicts = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicts == labels).cpu().sum() print(total) print('Accuracy = %.2f' % (100 * correct / total))Copy the code
The code has been debugged and is running with a final accuracy of about 88%.
VGG19 code implementation
Import torchvison.transforms as transforms train_tf = transforms.Compose([transforms.Resize((224,224))), Transforms. RandomHorizontalFlip (0.5), transforms. ToTensor (), transforms. The Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784]]) valid_tf = transforms.Com pose ([transforms. The Resize ((224224)), Transforms. ToTensor (), transforms the Normalize ([0.49139968, 0.48215841, 0.44653091], [0.24703223, 0.24348513, 0.26158784])])Copy the code
import torch from torch.utils.data import DataLoader import torchvision.datasets as dsets import torchvision.transforms Transforms # torchVision. Transforms defines a series of transformations, including PILImage, NUMpy,Tensor, and then processing data. Batch_size = 64 # MNIST dataset train_dataset = dsets.CIFAR10(root = 'I:/datasets/cifar', Transform = train_tf, Test_dataset = dsets.CIFAR10(root = 'I:/datasets/cifar', Train = False, transform = valid_tf, # does not consider the use of any data preprocessing download = True) # # download images load data from the Internet train_loader = torch. Utils. Data. The DataLoader (= the dataset train_dataset, batch_size = batch_size, Shuffle disrupted test_loader = = True) # data torch. Utils. Data. The DataLoader (dataset = test_dataset, batch_size = batch_size, shuffle = False)Copy the code
from torch import nn class VGG19(nn.Module): def __init__(self,in_dim,num_classes): super().__init__() self.features = nn.Sequential( nn.Conv2d(in_dim,64,kernel_size=3,padding=1), nn.BatchNorm2d(64), Nn. ReLU (inplace = True), nn. Conv2d (64 (kernel_size = 3, padding = 1), nn. BatchNorm2d (64), nn. ReLU (inplace = True), Nn. MaxPool2d (kernel_size = 2, stride = 2), nn. Conv2d (64128, kernel_size = 3, padding = 1), nn. BatchNorm2d (128). nn.ReLU(inplace=True), nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(256, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2,stride=2), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.BatchNorm2d(512), nn.ReLU(inplace=True), Nn.MaxPool2d(kernel_size=2,stride=2) self. Classifier = nn.Sequential(nn.Linear(512*7*7,4096), nn.ReLU(inplace=True), Dropout(0.5), nn.Linear(4096,4096), nn.ReLU(True), nn.Linear(4096,num_classes)) def forward(self, x): x = self.features(x) x = x.view(x.size(0),-1) x = self.classifier(x) return xCopy the code
Device = torch. Device ('cuda') if torch. Cuda.is_available () else torch. Device (' CPU ') net = VGG19(3,10) net.to(device)Copy the code
Import numpy as NP Learning_rate = 1e-2 # learning rate num_epoches = 20 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(net.parameters(), Lr = learning_rate,momentum=0.9)# use stochastic gradient descent net.train() # start training mode for epoch in range(num_epoches): print('current epoch = %d' % epoch) for i, (images, labels) in enumerate(train_loader): Images = images.to(device) labels = allelages.to (device) #print(images.shape) # [batch, channel, width, height) outputs = net (images) # data set into the network to calculate the loss before doing = criterion (outputs, Loss. Backward () # Loss Backpropagation optimizer.step() # Update parameter if I % 100 == 0: print('current loss = %.5f' % loss.item()) print('finished training')Copy the code
Eval () # enable evaluation mode for images, labels in test_loader: images = images.to(device) labels = labels.to(device) outputs = net(images) _, predicts = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicts == labels).cpu().sum() print(total) print('Accuracy = %.2f' % (100 * correct / total))Copy the code
The resources
VGGNet paper