See my column for the previous part of this series:

Technical memorandumzhuanlan.zhihu.com

In (2) of this series, an extremely simple generation model was introduced.

In this article, we’ll leave its complexity aside. Let’s think about whether we can make it simpler, and where the limit of simplicity is.

In my opinion, the secret of model generation lies in Deep Image Prior, not in “distributed integral integration” :

Deep Image Priordmitryulyanov.github.io

What Deep Image Prior says is simple from an experimental point of view. However, from a theoretical point of view, it is much more complicated than “integral of distribution”, so I have not seen anyone to theoretical analysis.

Everybody’s just trying to figure out integrals of distributions, but it’s missing the point.

In this article we do some small experiments. Let’s see where the limits of model generation lie.

1. How easy can generating a model really be?

The most interesting thing about generating models is from.

No one can sayWhat is. Even given a bunch of imagesObviously they can still come from almost anything.

But we can add prior, and then we can estimate.

Convolutional networks are an excellent prior to natural images. This is the heart and soul of the DL CV.

Consider the following Generative model, which can be called DGN, Direct Generative Network:

You read that right. It looks like this. Nothing.


2. Train the simplest generation model

I said G is a generative model. Because first of all, it’s guaranteed to generate every sample X almost perfectly, without dropping any of them.

Such a basic requirement is not even guaranteed by the current GAN/VAE/Flow series. The current generation model is too flawed to restore the original data set.

So, how does DGN train? For example, there are two methods:

  1. The dumbest way. We take a bunch of random z’s, and we randomly specify which X each z corresponds to. Yes, that’s how rough it is.
  2. A slightly smarter way to do this is, every time you train G, you also calculate the gradient of Z, so that Z flows as well.

The 1 here reminds us that [1611.03530] Understanding deep learning requires rethinking generalization.

Yeah, G can really learn anything by force. We’ll see.


3. What is the so-called sample distribution?

One might ask, how do you know that the distribution of G generation fits that mysterious one

For example, is the image that you interpolate by doing this really true? Randomly generated z, is that true?

So, I ask you, you tell meIs what? What are the criteria?

  • If you say IS FID or something like that, why don’t we just add that to the training process as a training goal?
  • What’s the point of reinventing some D/E’s? It’s not doing the same thing in essence, but it can satisfy the needs of “calculating integration and issuing papers”. And it’s unstable, and it falls apart all the time, and it’s just patched up, and I’m telling you, it’s endless.
  • Yeah, I’m telling you, the D with the current output of one digit, it’s always looped, it’s never filled. For the reasons I wrote more than a year ago:

PENG Bo: Geometric Picture of GAN: Morse Flow in Sample Space and Discriminating Network D Why is it Impossible to Truly distinguish true from Falsezhuanlan.zhihu.com

  • For simple toy datasets (e.g., a cluster of gaussian arrays), you can fill in the holes, but it doesn’t make any sense.
  • If you say that D is unsupervised, we can do it in a much better way, for example, by making a refined model of the image directly (like a language model), which, with its constraints, is very high dimensional, thus solving the problem I raised in “Geometric picture”. It would surely be far more universal, far more stable, far more meaningful than the crippled D’s of today. Isn’t it?

If you want to improve DGN, there is no need to train a D tightly coupled to a particular G. Another universal D should be trained.

Isn’t it the simplest, most reasonable and accurate method to estimate the sample distribution directly with deep image prior? What’s the point of doing so much?


4. Experiment

Forget fixing your desktop BuG10. Run with your laptop. Got a profile data set made by an old friend of mine, run with it. The network architecture uses the most original DCGAN G, filter is 512/256 /… .

To see how well DL memorizes by rote, we use the stupidest method of training, not even shuffle, just pick a bunch of heads, randomly sample them to generate a fixed Z, and train X in order each time. First of all, z only takes 50 dimensions.

Try 64 of them first. Batch is also 64 (therefore, it can be said that everything is the dumbest). The optimizer uses Adam in DCGAN with exactly the same parameters.

Step 0:


The 100th step:


The 200th step:


The 500th step:


The 900th step:


It’s exactly the same as the original data set except for some noise. The web manages to remember everything.

Notice that this isn’t even SGD, it’s GD. Once again, big enough networks don’t care about the “local minimum” problem.

Then. Add data volume. Add to 5*64=320 images. Batch is still 64. Again, no shuffle.

The 200th step:


The 900th step:


Still restore all images almost perfectly.

No matter how diverse this is. The more samples you learn, the slower you learn, the less accurate you learn. As for how to improve, we also know. It was pretty rough anyway.


5. Think about

There are already a lot of questions worth thinking about, I will first send out the article, you can experiment and think for yourself.

Such as:

  1. What is the best DGN architecture for common data sets?
  2. How can the correct Z-dimension be determined for different data sets? In theory, in fact, one dimensional Z would cover all the samples. Of course, it can’t
  3. Does it make sense to add z to the flow?
  4. Can you generate reasonable interpolation and new samples? What is the effect of adding the auxiliary network mentioned above?

Continue. We welcome your discussion and attention.

There’s more to come, but this is just an appetizer for now.