This is the 14th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

Deep Learning with Python

This article is one of a series of notes I wrote while studying Deep Learning with Python (2nd edition, by Francois Chollet). This post marks the turn from Jupyter Notebooks to Markdown, as you can check out the original.ipynb notebooks at GitHub or Gitee.

You can read the original copy of the book online (in English) at this website. The book’s author also gave the accompanying Jupyter notebooks.

This paper is one of the notes in Chapter 8. Generative Deep Learning.

8.2 DeepDream

DeepDream, a technology that allows machines to modify images, uses convolutional neural networks to produce psychedelic images:

Since the CNN used by DeepDream is trained on ImageNet, which contains a large number of animal images, there are many artifacts of animals and parts of animals in the images generated by DeepDream.

DeepDream’s algorithm is similar to the convolutional neural network filter visualization technique. Recall that what the CONVOLUtional neural network filter visualization does is run the convolutional neural network backwards, taking input from a blank image with random noise, and doing gradient ascent to maximize the activation of a filter.

The main differences between DeepDream and filter visualization are:

  • In DeepDream, we try to maximize the activation of all layers, not just one, so that there’s a lot of visual features mixed together — producing a more psychedelic image.
  • Starting with an existing image, rather than random noise input, produces an image that incorporates visual patterns that already exist in the input and distorts some of those elements to produce a more psychedelic image.
  • The input image is processed at different scales — known as octaves — which improves the quality of the output.

Implement DeepDream with Keras

To begin, we need to turn off the just-in-time execution mode of Tensorflow 2.x, see Tensorflow #33135.

import tensorflow as tf
tf.compat.v1.disable_eager_execution()
Copy the code

Then the first step is to select a convolutional neural network pre-trained on ImageNet: VGG16, Inception, ResNet50, etc. Practice has proved that Inception can be generated better, so here we use the Inception V3 model built into Keras.

Loading the Inception V3 model for pre-training:

from tensorflow.keras.applications import inception_v3
from tensorflow.keras import backend as K

K.set_learning_phase(0)

model = inception_v3.InceptionV3(weights='imagenet', include_top=False)
Copy the code

Next, define the loss — the amount that needs to be maximized by gradient ascent. In DeepDream, we wanted to maximize all filter activations on multiple layers simultaneously. This is achieved by taking a weighted sum of the L2 norm activated by a set of layers near the top and maximizing this value. The selection of layers and the weight distribution have a great influence on the generated results:

  • The layer near the bottom generates a basic geometric pattern;
  • The layers near the top generate images that can be seen for certain objects (such as birds or dogs in ImageNet)

Output the structure of the Inception V3 model(you can use Tf.keras.utils.plot_Model (Model)), with any number of layers selected, in this case mixed4, Mixed5, Mixed6 and Mixed7.

Write these layers to the DeepDream configuration:

layer_contributions = {
    'mixed4': 0.0.'mixed5': 3.0.'mixed6': 2.0.'mixed7': 1.5,}Copy the code

And then we need to ask for the losses for those who are selected in the middle. Define the loss to be maximized:

layer_dict = dict([(layer.name, layer) for layer in model.layers])

loss = K.variable(0.)

for layer_name in layer_contributions:
    coeff = layer_contributions[layer_name]
    activation = layer_dict[layer_name].output
    
    scaling = K.prod(K.cast(K.shape(activation), 'float32'))
    # loss += coeff * K.sum(K.square(activation[:, 2: -2, :])) / scaling
    # should use the following code 👇. Reference: https://github.com/fchollet/deep-learning-with-python-notebooks/issues/43
    loss = loss + coeff * K.sum(K.square(activation[:, 2: -2, :])) / scaling
Copy the code

Do a gradient rise on the loss:

dream = model.input

grads = K.gradients(loss, dream)[0]

grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)

outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)

def eval_loss_and_grads(x) :
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1]
    return loss_value, grad_values

def gradient_ascent(x, iterations, step, max_loss=None) :
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print(f'   loss value at {i}: {loss_value}')
        x += step * grad_values
    return x
Copy the code

Finally, the DeepDream algorithm is implemented: a list of scales (also called octaves) is defined to contain the images to be processed. The image at the latter scale is magnified by a multiple of the previous one. DeepDream runs the gradient ascent at the current scale from the smallest to the largest on this list, then zooms in on the resulting image. As you zoom in, the image becomes blurry, so you have to re-inject any missing details into the image.

Some auxiliary functions:

import scipy
import imageio

from tensorflow.keras.preprocessing import image

def resize_img(img, size) :
    img = np.copy(img)
    factors = (1.float(size[0]) / img.shape[1].float(size[1]) / img.shape[2].1)
    return scipy.ndimage.zoom(img, factors, order=1)

def save_img(img, fname) :
    pil_img = deprocess_image(np.copy(img))
    # scipy.misc.imsave(fname, pil_img)
    imageio.imsave(fname, pil_img)
Copy the code

Open the image, resize the image, and convert the image format to tensors that the Inception V3 model can handle:

def preprocess_image(image_path) :
    img = image.load_img(image_path)
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img
Copy the code

To convert a tensor to a valid image:

def deprocess_image(x) :
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, x.shape[2], x.shape[3]))
        x = x.transpose((1.2.0))
    else:
        x = x.reshape((x.shape[1], x.shape[2].3))
        
    x /= 2.
    x += 0.5
    x *= 255.
    x = np.clip(x, 0.255).astype('uint8')
    return x

Copy the code

Running gradient ascent at multiple continuous scales:

import numpy as np

step = 0.01     # Step size of gradient ascent
num_octave = 3  # Number of scales
octave_scale = 1.4 # Size ratio of two scales
iterations = 20 # Run the number of steps of gradient ascent at each scale

max_loss = 10.  # Stop gradient rise when losses rise too much to avoid ugly artifacts

base_image_path = './img.png'

img = preprocess_image(base_image_path)

original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
    shape = tuple([dim // (octave_scale ** i)
                  for dim in original_shape])
    successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]

original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])

for shape in successive_shapes:
    print('Processing image shape', shape)
    img = resize_img(img, shape)
    img = gradient_ascent(img,
                          iterations=iterations,
                          step=step,
                          max_loss=max_loss)
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
    same_size_original = resize_img(original_img, shape)
    lost_detail = same_size_original - upscaled_shrunk_original_img
    
    img += lost_detail
    shrunk_original_img = resize_img(original_img, shape)
    save_img(img, fname=f'dream_at_scale_{shape}.png')
    
save_img(img, fname='final_dream.png')
Copy the code

The final result:

You can see DeepDream has drawn several dogs in the final_Dream picture 🐶.

Note: Due to the size of the Inception V3 original training, the DeepDream implementation implemented here is able to get better results on images with sizes between 300×300 and 400×400, but this is not a strict limit, any size is acceptable.


Note: It is also easy to implement a better DeepDream using the Eager mode of TensorfFlow 2. See this official tutorial on TensorFlow for more details.