Edit | Debra
AI Front Line introduction:Generative Adversarial Nets (GAN) have been widely used in image generation, super-resolution image generation, image compression, image style conversion, data enhancement, text generation and other scenes. More and more researchers are engaged in the research of GAN networks, and various variations of GAN models are proposed, including CGAN, InfoGAN, WGAN, CycleGAN and so on. In order to make it easier to apply and practice GAN models, Google has opened source the TensorFlow library named TFGAN, which can quickly practice various GAN models.






Please pay attention to the wechat public account “AI Front”, (ID: AI-front)

This paper mainly explains how TFGAN is applied to native GAN, CGAN, InfoGAN, WGAN and other scenarios, as follows:


Among them, Mnist images generated by native GAN are not controllable: CGAN can generate digital images of corresponding labels according to digital labels; InfoGAN can be considered as an unsupervised CGAN. The first two lines represent the generation category of numbers controlled by classification latent variables, the middle two lines represent the thickness of numbers controlled by continuous latent variables, and the last two lines represent the tilt direction of numbers controlled by continuous latent variables. ImageToImage is a type of CGAN that implements image style conversion.

Generate adversarial network and TFGAN

GAN, first proposed by Goodfellow, is mainly composed of two parts: Generator (G for short); Discriminator, or D. Generator mainly uses noise Z to generate a sample similar to real data, the more realistic the sample is, the better; Discriminators are used to estimate whether a sample comes from real data or generated data, and the more accurate the decision, the better. As shown below:


In the figure above, for real sampling data, D(x) is generated after network discrimination. The output of D(x) is a real number in the range 0-1, which determines the probability that the picture is a real picture. So for real data, D of x is as close to 1 as possible. For the random noise Z, after generating network G, G converts the random noise to generate data X. If the problem is image generation, the output of the G-network is a generated fake image, denoted by G(z). The discriminant model D should make D(G(z)) close to 0, that is, the generated picture can be judged to be false. To generate model G, D(G(z)) should be close to 1, that is, to deceive the discriminant model and make D think that the false data generated by G(z) is true. In this way, the game between model D and model G makes D unable to judge whether a picture is generated or real and ends.

Assuming that P_r and P_g represent the distribution of real data and generated data respectively, the objective function of the discriminant model can be expressed as:


The purpose of model generation is to make the discriminant model D unable to distinguish real data from generated data, so the optimization objective function is:


TFGAN library address for https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gan, mainly includes the following components:

  1. Core architecture mainly includes creating TFGAN model, adding Loss value, creating training operation and running training operation.

  2. Common operation, mainly provides gradient pruning operation, normalization operation and conditional operation.

  3. The loss function mainly provides loss and penalty functions commonly used in GAN, such as Wasserstein loss, gradient penalty, mutual information penalty, etc.

  4. Model evaluation, providing Inception Score and Frechet Distance metrics for evaluating unconditional generation models.

  5. Google has also opened source common GAN network sample code, including Unconditional GAN, Conditional GAN, InfoGAN, WGAN, etc. Related cases can be downloaded from https://github.com/tensorflow/models/tree/master/research/gan/ address.


Using TFGAN library to train GAN network mainly includes the following steps:

  1. Determine the GAN network input as follows:



  2. Set the generation model and discriminant model in GANModel, as follows:



  3. Set the loss equation in GANLoss as follows:


  4. Set the training operation in GANTrainOps as follows:


  5. Run the model training as follows:


CGAN

Conditional Generative Adversarial Nets (CGAN), aiming at the uncontrollable defects of GAN, add supervision information to train them from unsupervised to supervised, and guide the generation of GAN networks. For example, input the label of the classification, the image of the corresponding label can be generated. In this way, the target equation of CGAN can be converted into:


Y is to join the supervision of information, D (x | y) said in determining real data under the condition of x, y D | y (z) (G) under the condition of y determine generated data G | y (z). For example, MNIST data sets can generate images of corresponding labels according to digital label information. Face generation data set, according to gender, whether smile, age and other information, generate the corresponding face picture. The architecture of CGAN is shown as follows:


TFGAN provides an API for creating condition tensor based on the one_hot_labels variable and input tensor, as follows:

tfgan.features.condition_tensor_from_onehot 
(tensor, one_hot_labels, embedding_size)Copy the code

Tensor = input, one_hot_labels = onehot, shape = [batch_size, num_classes], embedding_size = embedding size for each label. The return value is Condition tensor.

ImageToImage

Phillip Isola et al. proposed image-to-Image Translation with Conditional Adversarial Networks, an Adversarial neural network for generating images based on CGAN. The basic idea of network design is as follows:


Where, x is the input line graph, G(x) is the generated image, y is the line graph X corresponding to the rendered real image, the generation model G is used to generate the image, and the judgment model D is used to determine the authenticity of the generated image. The discriminant network maximizes the judgment of (x,y) data to be true and (x,G(x)) data to be false. The generative network makes the discriminant network judgment (x,G(x)) data true, so that the generative model and the discriminant model can play a game. In order to make the generated model not only cheat the discriminant model, but also make the generated image resemble the real image, the L1 distance between the real image and the generated image is added into the objective function, as shown below:

TFGAN library, which provides the API usage example of ImageToImage generation adpresumption network related loss equation, as shown below:

Define L1 loss of real data versus generated data


# gan_loss is the loss of the objective function

gan_loss = tfgan.losses.combine_adversarial_loss
(gan_loss, gan_model, l1_pixel_loss, weight_factor=FLAGS.
weight_factor)Copy the code

InfoGAN

In GAN, when the generator uses noise Z to generate data, it does not add any restriction, so it is difficult to represent the relevant semantic features with any dimension information of Z. Therefore, in the process of data generation, it is impossible to control what kind of noise Z can generate what kind of data, which limits the use of GAN to a large extent. InfoGAN can be considered as an unsupervised CGAN, adding latent variable C to noise Z, so that the data generated by the generation model and shallow variable C have relatively high mutual information, in which Info stands for mutual information. Mutual information, defined as the difference of two entropy H (x) is the prior distribution of entropy, H (x | y) on behalf of the entropy of the posterior distribution. If x and y are independent variables, then the value of mutual information is 0, indicating that x and y are not related. If x and y are correlated, then mutual information is greater than 0. So given y, you can infer which x values are going to be high. Thus, InfoGAN’s target equation is:

The network structure of InfoGAN is as follows:



Above InfoGAN and the difference between GAN, corresponding to the output of the D (x) discriminant network generates variational distribution Q (c | x), which can use Q (c | x) to approximate P (c | x), thus increasing the generate data and latent variable c mutual information. TFGAN provides infogan-related apis, as shown below:

# Define infogan model via tggan. infogan_model


# Generate the loss value of infogan model through tfgan.gan_loss:


# InfoGan Loss value is added to the Loss value of GAN with mutual information I(c; G(z,c)), TFGAN provides API for mutual information calculation, as shown below. Noise information, including structured_generator_inputs for latent variables predicted_distributions for variational distribution Q (c | x).

WGAN

Martin Arjovsky et al. proposed WGAN (Wasserstein GAN), which solved the problems of traditional GAN training difficulties, loss of generator and discriminator indicating training process, lack of diversity of generated samples, etc., and mainly had the following advantages:

  1. The training degree of generator and discriminator can be balanced to make the GAN model training stable.

  2. To ensure the diversity of production samples.

  3. The Wasserstein distance is used to measure the training degree of the model. The smaller the value is, the better the training is and the higher the image quality generated by the synthesizer is.

The differences between the WGAN algorithm and the original GAN algorithm are as follows:

  1. Remove the sigmoID operation on the last layer of the discriminant model.

  2. The loss value of model generation and discriminant model does not take log operation.

  3. After each update of the parameters of the discriminant model, the absolute value of the model parameters is truncated to a fixed constant C.

  4. Use RMSProp instead of momentum-based optimization algorithms such as Momentum and Adam.

The algorithm structure of WGAN is shown as follows:


TFGAN provides WGAN-related apis, as shown below:

Generate network loss equation

generator_loss_fn=tfgan_losses.wasserstein_generator_lossCopy the code

# Discriminating network loss equation

discriminator_loss_fn=tfgan_losses.wasserstein_discriminator
_lossCopy the code

conclusion

In this paper, generative adversarial network and TFGAN are firstly introduced. The model of generative adversarial network is used for image generation, super-resolution image generation, image compression, image style transformation, data enhancement, text generation and other scenes. TFGAN is a TensorFlow library for quickly practicing various GAN models. Then, the main ideas of CGAN, ImageToImage, InfoGAN and WGAN models are explained, and the key technologies are analyzed, including objective function, network architecture, loss equation and corresponding TFGAN API. Users can quickly generate adversarial network model based on TFGAN practice and apply it to relevant scenarios in the industrial field.

reference

[1] Generative Adversarial Networks.

[2] Conditional Generative Adversarial Nets.

[3] InfoGAN: Interpretable Representation Learning by Information MaximizingGenerative Adversarial Nets.

[4] Wasserstein GAN.

[5] Image-to-Image Translation with Conditional Adversarial Networks.

[6]https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gan.

[7] https://github.com/tensorflow/models/tree/master/research/gan.

Please pay attention to the wechat public account “AI Front”, (ID: AI-front)