Spatial Adaptive Normalization (Normalization)

Spatial Adaptive Normalization (SPADE) is a major innovation in GauGAN, which is used for level Normalization of semantic segmentation graph. In order to better interpret SPADE, it is necessary to first understand the network input of GauGAN, semantic segmentation graph.

Split the mask with a unique heat coded tag

Consider training a Facades data set for GauGAN. Among them, the segmentation graph is encoded with different colors in RGB image, as shown in the figure below. For example, a wall is shown in blue and columns in red. This representation is visually easy for us to understand, but it doesn’t help with neural network learning, because colors have no semantics for gans.

Just because colors are closer to each other in color space doesn’t mean they’re closer semantically. For example, we can represent grass in light green and airplanes in dark green, and their semantics are not relevant even if the color of the split graph is similar.

Therefore, we should use class labels instead of colors to mark pixels. However, this still doesn’t solve the problem, because category labels are randomly assigned numbers, and they have no semantics. Therefore, a better approach is to use a split mask labeled 1 when there is an object in that pixel, and a split mask labeled 0 otherwise. In other words, we independently encode the labels in the segmentation graph as segmentation masks for shapes (H, W, number of classes).

In JPEG encoding, less important visual information is removed during compression. Even though the resulting pixels should belong to the same class and look the same color, they may have different values. Therefore, we cannot map colors in JPEG images to classes. To solve this problem, we need to use the uncompressed image format BMP. In image loading and preprocessing, we will load files and convert them from BMP to a separate thermal coded segmentation mask.

Sometimes TensorFlow’s basic image preprocessing API can’t perform complex tasks, so we need to use other Python libraries. Tf.py_function allows us to run generic Python functions in the TensorFlow training flow:

def load(image_file) :
    def load_data(image_file) :
        jpg_file = image_file.numpy().decode('utf-8')
        bmp_file = jpg_file.replace('.jpg'.'.bmp')
        png_file = jpg_file.replace('.jpg'.'.png')
        image = np.array(Image.open(jpg_file))/127.5-1
        map = np.array(Image.open(png_file))/127.5-1
        labels = np.array(Image.open(bmp_file), dtype=np.uint8)
        h,w,_ = image.shape
        n_class = 12
        mask = np.zeros((h,w,n_class),dtype=np.float32)
        for i in range(n_class):
            one_hot[labels==i,i] = 1
        return map, image, mask
    [mask, image, label] = tf.py_function(load_data, [image_file], [tf.float32, tf.float32, tf.float32])
Copy the code

Knowing the format of the semantic segmentation mask of the unique heat coding, we will use TensorFlow2 to implement SPADE.

Realize the SPADE

Instance normalization has been very popular in image generation, but it tends to weaken the semantics of segmentation masks: assume that the input image contains only one segmentation tag; For example, if the entire image is sky, since the input has a uniform value, the output will also have a uniform value after passing through the convolution layer.

The example is normalized to calculate the average across dimensions (H, W) for each channel. Therefore, the mean of the channel will be the same uniform value, and the normalized activation subtracted from the mean will be zero. Obviously, the semantics have been lost, and this is a very extreme example, but the logic is similar, as we can see that the split mask loses its semantic meaning as its area increases.

To solve this problem, SPADE normalizes the local areas that are qualified by the split mask, rather than the entire mask. SPADE’s architecture is shown below:

In batch normalization, the mean and standard deviation of the channels calculated across dimensions (N, H, W) are the same for SPADE. The difference is that the γ and β of each channel are no longer scalar values, but rather two-dimensional vectors of the shape (H, W). In other words, there is a gamma and beta value for each activation learned from the semantic segmentation graph. Therefore, normalization is applied differently to different segmented regions. These two parameters are learned by using two convolution layers, as shown in the figure below:

SPADE is not only used in the network input phase, but also in the internal layer. SPADE can now be implemented for a custom layer that uses TensorFlow2.

The convolution layer is first defined in the __init__ constructor, as follows:

class SPADE(layers.Layer) :
    def __init__(self, filters, epsilon=1e-5) :
        super(SPADE, self).__init__()
        self.epsilon = epsilon
        self.conv = layers.Conv2D(128.3, padding='same', activation='relu')
        self.conv_gamma = layers.Conv2D(filters, 3, padding='same')
        self.conv_beta = layers.Conv2D(filters, 3, padding='same')
Copy the code

Next, get the activation diagram size to use later when resizing:

    def build(self, input_shape) :
        self.resize_shape = input_shape[1:3]
Copy the code

Finally, connect the layer to the operation in call(), as follows:

    def call(self, input_tensor, raw_mask) :
        mask = tf.image.resize(raw_mask, self.resize_shape, method='nearest')
        x = self.conv(mask)
        gamma = self.conv_gamma(x)
        beta = self.conv_beta(x)
        mean, var = tf.nn.moments(input_tensor, axes=(0.1.2), keepdims=True)
        std = tf.sqrt(var+self.epsilon)
        normalized = (input_tensor - mean) / std
        output = gamma * normalized + beta
        return output
Copy the code

Next, we’ll look at how SPADE can be used.

SPADE is applied in residual network

Finally, how SPADE is inserted into the residual block will be studied:

The basic building block of SPADE residue block is SPADE- Relu-Conv layer. Each SPADE takes two inputs — the activation of the previous layer and the semantic segmentation graph.

As with standard residual blocks, there are two convolution ReLU layers and a jump path. As long as the number of channels before and after the residual block changes, it is necessary to carry out learning jump connection connection. When this happens, the activation diagrams at the inputs of the two SPADE in the forward path will have different dimensions. However, we have built resizing into the SPADE block. Here is the code for SPADE residuals to build the required layers:

class Resblock(layers.Layer) :
    def __init__(self, filters) :
        super(Resblock, self).__init__()
        self.filters = filters
    
    def build(self, input_shape) :
        input_filter = input_shape[-1]
        self.spade_1 = SPADE(input_filter)
        self.spade_2 = SPADE(self.filters)
        self.conv_1 = layers.Conv2D(self.filters, 3, padding='same')
        self.conv_2 = layers.Conv2D(self.filters, 3, padding='same')
        self.learned_skip = False
        ifself.filters ! = input_filter: self.learned_skip =True
            self.spade_3 = SPADE(input_filter)
            self.conv_3 = layers.Conv2D(self.filters, 3, padding='same')
Copy the code

Finally, join the layers together in call() :

    def call(self, input_tensor, mask) :
        x = self.spade_1(input_tensor, mask)
        x = self.conv_1(tf.nn.leaky_relu(x, 0.2))
        x = self.spade_2(x, mask)
        x = self.conv_2(tf.nn.leaky_relu(x, 0.2))
        if self.learned_skip:
            skip = self.spade_3(input_tensor, mask)
            skip = self.conv_3(tf.nn.leaky_relu(skip, 0.2))
        else:
            skip = input_tensor
        output = skip + x
        return output
Copy the code