preface
In my last post, I looked at what GANs can do. If you want to learn more about GAN, you can start with these 10 papers.
Translation of this article:
Towardsdatascience.com/must-read-p…
The original text introduces 10 papers about GANs and the latest progress, which are introduced in different order from the original text. I introduce them in chronological order from the first GANs paper to the latest one, as shown below:
- Generative Adversarial Networks, 2014
- Conditional GANs, 2014
- DCGAN, 2015
- Improved Techniques for Training GANs, 2016
- Pix2Pix, 2016
- CycleGAN, 2017
- Progressively Growing of GANs, 2017
- StackGAN, 2017
- BigGAN, 2018
- StyleGAN, 2018
The first paper the author recommends starting is DCGAN.
At the end of this article, I will introduce several Github projects that focus on collecting papers on GAN and implementing the GANs model with TensorFlow, PyTorch, and Keras.
1. Generative Adversarial Networks
Topic: Generative Adversarial Nets
Address: arxiv.org/abs/1406.26…
Ian Goodfellow, the father of GAN, published the first paper proposing GAN, which should be read by anyone starting to study GAN. It proposed the GAN model framework, discussed the unsaturated loss function, The derivative of optimal discriminator is given and proved. Finally, experiments were carried out on Mnist, TFD and CIFAR-10 data sets.
2. Conditional GANs
Thesis Title: Conditional Generative Adversarial Nets
Address: arxiv.org/abs/1411.17…
If the last GAN paper is the beginning of the GAN model framework, which makes people feel amazing, this cGAN paper is one of the important factors that make the GAN model technology become so popular. In fact, GAN is an unsupervised model at the beginning, and the generator only needs random noise, but the effect is not that good. It was proposed in 2014, but before 2016, there were not many researches in this field, and a lot of relevant papers were actually published. The first factor was cGAN, and the second factor was DCGAN, which will be introduced later.
CGAN is actually will GAN and back to the supervision of learning fields, as shown in the figure below, it’s on the generator parts added category label this input, through the improvement, to alleviate a problem in the GAN, training is not stable, and this kind of thought, the practice of the priori knowledge, in today’s most famous GAN using this approach, BigGAN, which generates pictures, or Pix2Pix, which converts pictures, are all of this idea. It can be said that the proposal of cGAN is very key.
3. DCGAN
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Address: arxiv.org/abs/1511.06…
In fact, the first paper recommended by the original author should be read this DCGAN paper, published in 2015. This is the first implementation of the GAN model using THE CNN structure. It introduces how to use the convolution layer and gives some additional structural guidance suggestions for implementation. In addition, it discusses how to visualize GAN features, interpolate hidden space, train classifier using discriminator features, and evaluate results. The following is a schematic diagram of the generator part of DCGAN
4. Improved Techniques for Training GANs
Improved Techniques for Training GANs
Address: arxiv.org/abs/1606.03…
One of the authors of this paper is Ian Goodfellow, and it gives a lot of advice on how to build a GAN structure, it can help you understand why GAN is unstable, it gives a lot of advice on stable training DCGANs, Such as feature matching, minibatch discrimination, one-sided label smoothing, and virtual batch Normalization, etc., using these recommendations to implement the DCGAN model is a great way to learn from GANs.
5. Pix2Pix
Thesis title: Image-to-image Translation with Conditional Adversarial Networks
Address: arxiv.org/abs/1611.07…
Pix2Pix aims to achieve image conversion applications, as shown below. This model needs to use pairs of training data during training, and adopts different configurations for GAN model. It is applied to the model PatchGAN, which observes a 70*70 area of the image to judge whether the image is true or false, without observing the whole image.
In addition, the generator uses U-NET structure, which is combined with skip Connections technology in ResNet network. The corresponding layers of encoder and decoder are interconnected. It can realize conversion operations as shown below, such as semantic map to street view, black and white picture coloring, sketch map to real photo, etc.
6. CycleGAN
Thesis Title: Unpaired image-to-image Translation using Cycle-Consistent Adversarial Networks
Address: arxiv.org/abs/1703.10…
The problem of Pix2Pix in my last paper is that the training data must be paired, that is, the original image and the corresponding transformed image, but the reality is that it is very difficult to find such data, and some even do not exist such one-to-one transformed data. Therefore, with CycleGAN, only data sets in two fields need to be prepared. Let’s say a picture of a horse and a picture of a zebra, but they don’t have to match each other. This paper proposed a very good method — cycle-consistency loss function, as shown in the following figure:
This structure will be useful in many subsequent GAN papers on image conversion applications. CycleGAN can implement some of the applications shown in the following figure, the conversion of ordinary horses and zebras, style transfer (photo to oil painting), winter and summer season change, and so on.
7. Progressively Growing of GANs
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Address: arxiv.org/abs/1710.10…
The reason this paper is a must read is because it achieves very good results and a creative approach to the GAN problem. It utilizes a multi-scale structure, from 4*4 to 8*8 all the way up to 1024*1024 resolution, as shown below. This paper proposes some solutions to the problem of instability caused by the size of the target image.
8. StackGAN
StackGAN: Text to Photorealistic Image Synthesis with Stacked Generative Adversarial Networks
Address: arxiv.org/abs/1612.03…
Progressively GANs and cGAN and StackGAN are similar in that they both adopt prior knowledge and multi-scale method. The whole network structure is shown in the figure below. In the first stage, according to the given text description and random noise, then 64*64 images are output, and then 256*256 images are generated again by taking them as prior knowledge. StackGAN uses a text vector to introduce text information and extract visual features compared to the seven previously recommended papers
9. BigGAN
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Address: arxiv.org/abs/1809.11…
BigGAN should be the best model for image generation on ImageNet at present. Its generation result is shown in the figure below, which is very realistic. However, this paper is difficult to reproduce on the local computer, because it combines many structures and technologies. It includes self-attention and Spectral Normalization, which are well described and illustrated in the paper.
10. StyleGAN
A style-based Generator Architecture for Generative Adversarial Networks
Address: arxiv.org/abs/1812.04…
StyleGAN draws on natural style transformation techniques such as Adaptive Instance Normalization (AdaIN) to control the hidden space variable Z. Its network structure, as shown in the figure below, combines a mapping network with AdaIN conditional distribution in the production model and is not easily replicated, but the paper is still worth reading and contains many interesting ideas.
summary
This paper mainly introduces 10 GAN papers worth reading, from the initial paper that proposed this model to the paper until 2018, including both influential cGAN and DCAN, as well as Pix2Pix and CycleGAN, which are very important in the field of image conversion. And then there’s BigGAN, which has been very effective recently.
If you want to do that, look at these 10 papers. In addition, I recommend a Github project that collects a large number of GAN papers and divides the papers according to their application direction:
- AdversarialNetsPapers
And three Github projects that reproduce multiple GANs models, namely the current mainstream frameworks, TensorFlow, PyTorch and Keras:
- Tensorflow-gans: tensorflow version
- Pytorch-GAN: Pytorch version
- Keras-gan: Keras version
Finally, the 10 papers introduced in this article have been downloaded and packaged, and can be obtained by:
- Follow the public account “Growth of algorithmic Ape”
- Reply “GAN Paper” on the session interface of the public account to obtain the web disk link.
Welcome to follow my wechat official account — the growth of algorithmic ape, or scan the QR code below, we can communicate, learn and progress together!