Author | Dr. VAIBHAV KUMAR compile | source of vitamin k | Analytics In Diamag

With deep learning models successfully implemented in a variety of applications, it is time to get results that are not only accurate but also faster.

The size of the data is very important in order to get more accurate results, but it is always a concern when this size affects the training time of machine learning models.

To overcome the training time issue, we used the TPU runtime environment to speed up training. To this end, PyTorch has been supporting the implementation of machine learning by providing state-of-the-art hardware accelerators.

PyTorch’s support for cloud TPU is achieved through integration with XLA (Accelerated Linear Algebra), a compiler for linear algebra that can target many types of hardware, including CPUS, Gpus, and TPUS.

This article demonstrated how to use PyTorch and TPU to implement a deep learning model to speed up the training process.

Here, we defined a convolutional Neural network (CNN) model using PyTorch and trained the model in a PyTorch/XLA environment.

XLA connects the CNN model to Google Cloud TPU (tensor processing unit) in a distributed multiprocessing environment. In this implementation, eight TPU cores are used to create a multiprocessing environment.

We will use this PyTorch deep learning framework to conduct fashion classification tests and observe training timing and accuracy.

Implement CNN with PyTorch and TPU

We will implement this in Google Colab because it provides a free cloud TPU (tensor processing unit).

Before proceeding to the next step, in the Colab notebook, go to Edit, then select Settings, and select “TPU” as the “Hardware Accelerator” from the list in the screenshot below.

Verify that the code under TPU is working properly.

import os
assert os.environ['COLAB_TPU_ADDR']
Copy the code

If TPU is enabled, it will execute successfully, otherwise it will return ‘KeyError:’ COLAB_TPU_ADDR ‘. You can also check the TPU by printing the TPU address.

TPU_Path = 'grpc://'+os.environ['COLAB_TPU_ADDR']
print('TPU Address:', TPU_Path)
Copy the code

With TPU enabled, we will install compatible control disks and dependencies to set up the XLA environment using the following code.

VERSION = "20200516"! curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py ! python pytorch-xla-env-setup.py --version $VERSIONCopy the code

Once the installation is successful, we will continue to define methods for loading data sets, initializing CNN models, training, and testing. First, we will import the required libraries.

import numpy as np
import os
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch_xla
import torch_xla.core.xla_model as xm
import torch_xla.debug.metrics as met
import torch_xla.distributed.parallel_loader as pl
import torch_xla.distributed.xla_multiprocessing as xmp
import torch_xla.utils.utils as xu
from torchvision import datasets, transforms
Copy the code

After that, we will further define the required hyperparameters.

# define parameters
FLAGS = {}
FLAGS['datadir'] = "/tmp/mnist"
FLAGS['batch_size'] = 128
FLAGS['num_workers'] = 4
FLAGS['learning_rate'] = 0.01
FLAGS['momentum'] = 0.5
FLAGS['num_epochs'] = 50
FLAGS['num_cores'] = 8
FLAGS['log_steps'] = 20
FLAGS['metrics_debug'] = False
Copy the code

The following code snippet defines the CNN model as an instance of PyTorch, along with functions to load the data, train the model, and test the model.

SERIAL_EXEC = xmp.MpSerialExecutor()

class FashionMNIST(nn.Module) :

  def __init__(self) :
    super(FashionMNIST, self).__init__()
    self.conv1 = nn.Conv2d(1.10, kernel_size=5)
    self.bn1 = nn.BatchNorm2d(10)
    self.conv2 = nn.Conv2d(10.20, kernel_size=5)
    self.bn2 = nn.BatchNorm2d(20)
    self.fc1 = nn.Linear(320.50)
    self.fc2 = nn.Linear(50.10)

  def forward(self, x) :
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = self.bn1(x)
    x = F.relu(F.max_pool2d(self.conv2(x), 2))
    x = self.bn2(x)
    x = torch.flatten(x, 1)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return F.log_softmax(x, dim=1)

Model weights are instantiated only once in memory.
WRAPPED_MODEL = xmp.MpModelWrapper(FashionMNIST())

def train_mnist() :
  torch.manual_seed(1)
 
  def get_dataset() :
    norm = transforms.Normalize((0.1307,), (0.3081,))
    train_dataset = datasets.FashionMNIST(
        FLAGS['datadir'],
        train=True,
        download=True,
        transform=transforms.Compose(
            [transforms.ToTensor(), norm]))
    test_dataset = datasets.FashionMNIST(
        FLAGS['datadir'],
        train=False,
        download=True,
        transform=transforms.Compose(
            [transforms.ToTensor(), norm]))

  
    return train_dataset, test_dataset


  Use serial actuators to avoid multiple processes downloading the same data
  train_dataset, test_dataset = SERIAL_EXEC.run(get_dataset)

  train_sampler = torch.utils.data.distributed.DistributedSampler(
    train_dataset,
    num_replicas=xm.xrt_world_size(),
    rank=xm.get_ordinal(),
    shuffle=True)

  train_loader = torch.utils.data.DataLoader(
      train_dataset,
      batch_size=FLAGS['batch_size'],
      sampler=train_sampler,
      num_workers=FLAGS['num_workers'],
      drop_last=True)

  test_loader = torch.utils.data.DataLoader(
      test_dataset,
      batch_size=FLAGS['batch_size'],
      shuffle=False,
      num_workers=FLAGS['num_workers'],
      drop_last=True)

  # Adjust your learning rate
  lr = FLAGS['learning_rate'] * xm.xrt_world_size()

  Get the loss function, optimizer, and model
  device = xm.xla_device()
  model = WRAPPED_MODEL.to(device)
  optimizer = optim.SGD(model.parameters(), lr=lr, momentum=FLAGS['momentum'])
  loss_fn = nn.NLLLoss()

  def train_fun(loader) :
    tracker = xm.RateTracker()
    model.train()
    for x, (data, target) in enumerate(loader):
      optimizer.zero_grad()
      output = model(data)
      loss = loss_fn(output, target)
      loss.backward()
      xm.optimizer_step(optimizer)
      tracker.add(FLAGS['batch_size'])
      if x % FLAGS['log_steps'] = =0:
        print('[xla:{}]({}) Loss={:.5f}'.format(
            xm.get_ordinal(), x, loss.item(), time.asctime()), flush=True)

  def test_fun(loader) :
    total_samples = 0
    correct = 0
    model.eval()
    data, pred, target = None.None.None
    for data, target in loader:
      output = model(data)
      pred = output.max(1, keepdim=True) [1]
      correct += pred.eq(target.view_as(pred)).sum().item()
      total_samples += data.size()[0]

    accuracy = 100.0 * correct / total_samples
    print('[xla:{}] Accuracy={:.2f}%'.format(
        xm.get_ordinal(), accuracy), flush=True)
    return accuracy, data, pred, target

  # Training and evaluation cycle
  accuracy = 0.0
  data, pred, target = None.None.None
  for epoch in range(1, FLAGS['num_epochs'] + 1) : para_loader = pl.ParallelLoader(train_loader, [device]) train_fun(para_loader.per_device_loader(device)) xm.master_print("Finished training epoch {}".format(epoch))

    para_loader = pl.ParallelLoader(test_loader, [device])
    accuracy, data, pred, target  = test_fun(para_loader.per_device_loader(device))
    if FLAGS['metrics_debug']:
      xm.master_print(met.metrics_report(), flush=True)

  return accuracy, data, pred, target
Copy the code

Now, to plot the results as prediction labels and actual labels for the test image, the following functional modules will be used.

# Result visualization
import math
from matplotlib import pyplot as plt

M, N = 5.5
RESULT_IMG_PATH = '/tmp/test_result.png'

def plot_results(images, labels, preds) :
  images, labels, preds = images[:M*N], labels[:M*N], preds[:M*N]
  inv_norm = transforms.Normalize((-0.1307/0.3081,), (1/0.3081,))

  num_images = images.shape[0]
  fig, axes = plt.subplots(M, N, figsize=(12.12))
  fig.suptitle('Predicted Lables')

  for i, ax in enumerate(fig.axes):
    ax.axis('off')
    if i >= num_images:
      continue
    img, label, prediction = images[i], labels[i], preds[i]
    img = inv_norm(img)
    img = img.squeeze() # [1,Y,X] -> [Y,X]
    label, prediction = label.item(), prediction.item()
    if label == prediction:
      ax.set_title(u'Actual {}/ Predicted {}'.format(label, prediction), color='blue')
    else:
      ax.set_title(
          'Actual {}/ Predicted {}'.format(label, prediction), color='red')
    ax.imshow(img)
  plt.savefig(RESULT_IMG_PATH, transparent=True)
Copy the code

Now we are all ready to train the model on the MNIST dataset. Before the training, we will record the start time, and after the training, we will record the end time and print the total training time of 50 epochs.

Start the training process
def train_cnn(rank, flags) :
  global FLAGS
  FLAGS = flags
  torch.set_default_tensor_type('torch.FloatTensor')
  accuracy, data, pred, target = train_mnist()
  if rank == 0:
    Retrieve the tensor on TPU core 0 and draw.
    plot_results(data.cpu(), pred.cpu(), target.cpu())

xmp.spawn(train_cnn, args=(FLAGS,), nprocs=FLAGS['num_cores'],
          start_method='fork')
Copy the code

Once the training is successfully completed, we print the total time spent on the training.

end_time = time.time()
print('Total Training time = ',end_time-start_time )
Copy the code

As we saw above, this method took 269 seconds or about 4.5 minutes, which means 50 epochs trained the PyTorch model in less than 5 minutes. Finally, we will visualize the prediction by training the model.

from google.colab.patches import cv2_imshow
import cv2
img = cv2.imread(RESULT_IMG_PATH, cv2.IMREAD_UNCHANGED)
cv2_imshow(img)
Copy the code

Therefore, we can conclude that implementing a deep learning model using TPU enables fast training, as we saw earlier.

In less than 5 minutes, 40000 training images of 50 epochs were trained with CNN models. We also achieved more than 89 percent accuracy in training.

Therefore, training deep learning models on TPU is always beneficial in terms of time and accuracy.

References:

  1. Joe Spisak, “Get Started with PyTorch, Cloud TPUs, and Colab.”
  2. PyTorch on XLA Devices, PyTorch Release.
  3. “Training PyTorch Models on Cloud TPU Pods”, Google Cloud Guides.

The original link: analyticsindiamag.com/how-to-impl…