Aryansh Omray is a data science engineer at Microsoft and a technology blogger on Medium

One of the fundamental problems in the field of machine learning is how to learn the representation of complex data is machine learning.

The importance of this task lies in the vast amount of unstructured and unlabelled data that exists and can only be understood through unsupervised learning. Density estimation, anomaly detection, text summarization, data clustering, bioinformatics, DNA modeling, and many other applications are required to accomplish this task.

Over the years, researchers have developed many methods to learn probability distributions of large data sets, including generative adversarial networks (gans), variational self-encoders (VAE) and Normalizing Flow.

In this paper, Normalizing Flow is introduced to overcome the deficiencies of GAN and VAE.

Sample output of Glow model (Source)

The ability of gans and VAE is amazing, they are able to learn very complex data distribution by simple reasoning methods.

However, both GAN and VAE lack accurate assessment and inference of probability distribution, which often results in poor quality of fuzzy results in VAE, and GAN training is also faced with challenges such as mode collapse and post crash.

Therefore, Normalizing Flow came into being, trying to solve many problems of gans and VAE by using reversible functions.

Normalizing Flow

Simply put, Normalizing Flow is a set of invertible functions, or the analytic inverse of these functions can be computed. For example, f(x)=x+2 is a reversible function because every input has and only one unique output, and vice versa, while f(x)=x² is not a reversible function. Such functions are also called bijective functions.

Figure source author

As can be seen from the figure above, Normalizing Flow can transform complex data points (such as images in MNIST) into a simple Gaussian distribution and vice versa. Unlike GAN, which inputs a random vector and outputs an image, a flow-based model converts data points into simple distributions. In the case of MNIST in the figure above, we can extract random samples from gaussian distribution and recover their corresponding MNIST images.

Flow-based models are trained using a negative logarithmic possibility loss function, where P (z) is a probability function. The following loss function is obtained using the variable-change formula in statistics.

(Source)

Advantages of Normalizing Flow

Normalizing Flow has various advantages over GAN and VAE, including:

  • Normalizing Flow model does not need noise in the output, so it can have a stronger local variance model.
  • Compared with GAN, the flow-based model training process is very stable, and GAN requires careful adjustment of generator and discriminator hyperparameters.
  • Compared with GAN and VAE, Normalizing Flow is easier to converge.

Lack of Normalizing Flow

While flow-based models have their advantages, they also have some disadvantages:

  • The performance of flow-based model in density estimation is not satisfactory.
  • A stream-based model requires volume preservation over transformations, which often results in very high dimensional potential Spaces, often resulting in poor interpretation.
  • The samples produced by stream-based models are generally not as good as those produced by GAN and VAE.

To better understand Normalizing Flow, take the Glow architecture as an example. Glow is a stream-based model proposed by OpenAI in 2018. The following image shows Glow’s architecture.

Glow structure (Source)

The Glow architecture is composed of many superficial layers. First let’s take a look at the multi-scale framework of the Glow model. The Glow model consists of a series of repeated layers (named scales). Each scale consists of an extrude function and a flow step, each flow step contains ActNorm, 1×1 Convolution, and Coupling Layer, followed by a split function. The partition function divides the input into two equal parts at the channel dimension. Half of them go into the next layer, and half go into the loss function. Segmentation is to reduce the effect of gradient disappearance, which occurs when the model is trained in an end-to-end manner.

As shown in the figure below, the squeeze function converts an input tensor of size [C, h, W] to a tensor of size [4C, H /2, W /2] by horizontally reshaping the tensor. In addition, in the test phase, a remodeling function can be used to reshape the input [4c, H /2, W /2] into a tensor of size [C, h, w].

(Source)

Other layers, such as ActNorm, 1×1 Convolution, and Affine Coupling layers, can be understood in the table below. This table shows the functionality of each layer (both forward and reverse). (Source)

implementation

After learning the basics of Normalizing Flow and Glow models, we will show how to implement the model using PyTorch and train on a MNIST dataset.

Glow model

First, we will implement the Glow architecture using PyTorch and NFlows. To save time, we used NFLOWS to include implementations of all layers.

import torch import torch.nn as nn import torch.nn.functional as F from nflows import transforms import numpy as np from  torchvision.transforms.functional import resize from nflows.transforms.base import Transform class Net(nn.Module): def __init__(self, in_channel, out_channels): super().__init__() self.net = nn.Sequential( nn.Conv2d(in_channel, 64, 3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 64, 1), nn.ReLU(inplace=True), ZeroConv2d(64, out_channels), ) def forward(self, inp, context=None): return self.net(inp) def getGlowStep(num_channels, crop_size, i): mask = [1] * num_channels if i % 2 == 0: mask[::2] = [-1] * (len(mask[::2])) else: mask[1::2] = [-1] * (len(mask[1::2])) def getNet(in_channel, out_channels): return Net(in_channel, out_channels) return transforms.CompositeTransform([ transforms.ActNorm(num_channels), transforms.OneByOneConvolution(num_channels), transforms.coupling.AffineCouplingTransform(mask, getNet) ]) def getGlowScale(num_channels, num_flow, crop_size): z = [getGlowStep(num_channels, crop_size, i) for i in range(num_flow)] return transforms.CompositeTransform([ transforms.SqueezeTransform(), *z ]) def getGLOW(): num_channels = 1 * 4 num_flow = 32 num_scale = 3 crop_size = 28 // 2 transform = transforms.MultiscaleCompositeTransform(num_scale) for i in range(num_scale): next_input = transform.add_transform(getGlowScale(num_channels, num_flow, crop_size), [num_channels, crop_size, crop_size]) num_channels *= 2 crop_size //= 2 return transform Glow_model = getGLOW()Copy the code

We can train the Glow model with various data sets, such as MNIST, CIFAR-10, ImageNet, etc. MNIST data sets are used for demonstration purposes.

Datasets like MNIST can be easily accessed from the Global Titanium Open Datasets platform, which contains all the open datasets commonly used in machine learning, such as classification, density estimation, object detection, and text-based classification datasets.

To access the dataset, we only need to create an account on Gewu Ti’s platform, and can fork the desired dataset directly. We can directly download or import the dataset using gewu Ti’s provided pipeline. The basic code and documentation are available on the support page at **TensorBay**.

With the Python SDK of TensorBay, we can easily import MNIST datasets into PyTorch:

from PIL import Image
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
​
from tensorbay import GAS
from tensorbay.dataset import Dataset as TensorBayDataset
​
class MNISTSegment(Dataset):
​
    def __init__(self, gas, segment_name, transform):
        super().__init__()
        self.dataset = TensorBayDataset("MNIST", gas)
        self.segment = self.dataset[segment_name]
        self.category_to_index = self.dataset.catalog.classification.get_category_to_index()
        self.transform = transform
​
    def __len__(self):
        return len(self.segment)
​
    def __getitem__(self, idx):
        data = self.segment[idx]
        with data.open() as fp:
            image_tensor = self.transform(Image.open(fp))
​
        return image_tensor, self.category_to_index[data.label.classification.category]
Copy the code

Model training

Model training can be simply started with the following code. This code creates a data loader using the Pipeline provided by Gewu Titanium TensorBay, where the ACCESS_KEY is available in TensorBay’s account Settings.

from nflows.distributions import normal ACCESS_KEY = "Accesskey-*****" EPOCH = 100 to_tensor = transforms.ToTensor() Normalization (mean=[0.485], STD =[0.229]) my_transforms = transforms.Compose([to_tensor, normalization]) train_segment = MNISTSegment(GAS(ACCESS_KEY), segment_name="train", transform=my_transforms) train_dataloader = DataLoader(train_segment, batch_size=4, shuffle=True, num_workers=4) optimizer = torch.optim.Adam(Glow_model.parameters(), 1e-3) for epoch in range(EPOCH): for index, (image, label) in enumerate(train_dataloader): if index == 0: image_size = image.shaape[2] channels = image.shape[1] image = image.cuda() output, logabsdet = Glow_model._transform(image) shape = output.shape[1:] log_z = normal.StandardNormal(shape=shape).log_prob(output) loss = log_z + logabsdet loss = -loss.mean()/(image_size * image_size * channels) optimizer.zero_grad() loss.backward() optimizer.step() print(f"Epoch:{epoch+1}/{EPOCH} Loss:{loss}")Copy the code

The code above uses a MNIST dataset. To use another dataset we can simply replace the data loader for that dataset.

The sample is generated

After the model training is complete, we can generate the sample with the following code:

samples = Glow_model.sample(25)
display(samples)
Copy the code

Using the NFLOWS library, we could generate samples in a single line of code, and the display function would display the samples in a grid.

Sample generated after training model with MNIST

conclusion

This paper introduces the basic knowledge of Normalizing Flow, compares it with GAN and VAE, and shows the basic working mode of Glow model. We also explained how to simply implement the Glow model and train it using MNIST data sets. With the help of gewu Ti open data set platform, data set access becomes very convenient.

[About Gewu Ti] Gewu Ti Intelligent Technology focuses on building a new artificial intelligence infrastructure. Through unstructured data platform and open data set community, it helps machine learning teams and individuals better release the potential of unstructured data, and makes AI application development faster and performance better. We will continue to lay a solid foundation for ai to empower thousands of industries, drive industrial upgrading, and promote the universal benefits of science and technology. At present, it has received tens of millions of dollars of investment from Sequoia, Yunqi, Zhenge, Fenghe, Yaotu Capital and Qiji Chuangtan.