“This is the 14th day of my participation in the First Challenge 2022.

preface

Today is the third day of the year, it is time to return to the theme, so today is also a simple talk about the optimization of GA algorithm, and the current more powerful place. I have to say that my previous exploration of the Python field was really one-sided, and I spent two or three years in high school and college, confining myself to the field of curd, crawler, and security. Although the actual implementation is C++ or Java or Other Static language or Other business scenario deployment languages.

I’m going to do a little demo today, no code.

“Parent Optimization”

We introduced our genetic algorithm earlier. The simplest model structure, but these places are actually a lot of optimization points. The first is the part where we select for variation in our offspring.

In fact, in this step, we simply replace the children directly with the parents. As mentioned here, crossover and variation may produce bad things, not necessarily good things, for example, two highly intelligent parents give birth to “stupid children”. So we actually need to do a simple filter. If we find that the new individual is not as good as the parent then obviously we don’t need to replace it.

So in our code (in the previous example we can calculate the fit and decide whether to replace)

def CrossOver(Parent,PopSpace) :
    "Cross DNA, we just choose one of the species to mate and have a baby :param parent: : Param PopSpace: :return:"
    if(np.random.rand()) < Cross_Rate:

        cross_place = np.random.randint(0.2, size=Dna_Size).astype(np.bool)
        cross_one = np.random.randint(0, Population_Size, size=1) # Choose a male/female mate
        Parent[cross_place] = PopSpace[cross_one,cross_place]
        Kid = Parent. Copy () kid = PopSpace[cross_one,cross_place]
    return Parent
Copy the code

I’m not going to write the code, it’s very simple and I don’t have a lot of space.

Likewise also can do in our variation, so it can ensure the whole moving in the right direction to run, to avoid “rat”, of course this is just one way of thinking, in fact if there are unqualified, we can also choose to let their parents regeneration and see, until living better than their parents, the offspring, as to do it at the time space complexity…

So this is also the idea of an algorithm microbial genetic algorithm

Dimension problems

Our current algorithm first uses binary encoding to represent a variable X, but what if we were a high-dimensional optimization that also divided a variable into a DNA using binary representation, with each group (e.g., consisting of several dependent variables) as a group? So if you use binary encoding, first of all, you’re going to have to consume computational power in the transcoding process, and at the same time, because you’re using binary encoding and then you’re going to compress the range, you’re going to lose precision to a certain extent.

For example, we now have this optimization

This is an optimization THAT I did when I simply “took out” a three-layer neural network.

At this point, the binary code is not appropriate, so as a group of DNA representation should be like this [X1,X2,X3,X4,X…]

So the question is, we also need to cross and mutate, and cross is easy, just like before, so how do we mutate?

Variable optimization

The first question we have to think about is, why did we change a decimal number to binary? Obviously for a binary, if you just mutate it, you just change a 0 to a 1 and vice versa. So how do you vary a decimal number? So there’s a little bit of math involved here, and we’re using a normal distribution to generate some numbers like this.

So in order to do this, our DNA has to carry two things, one is our number, and one is our variance, and we take the mean of the number, and we construct a normal distribution, and we generate a data. At the same time, the variance information carried can also be cross-mutated to achieve species diversity

So this is also called evolutionary algorithms, OR EA

Time complexity optimization

Next we find we are doing this kind of operation, we often make a lot of iterative calculation, because of our population, but we can find in under the premise of sufficient quantity of calculation, through the optimization of each generation, purification, and finally can find these last few should move in a small area, or at a point directly. So it turns out that in some cases we don’t need to use that many individuals, in fact we might only need one.

We create a child from an individual, and then we go into the cycle, and then the child becomes the parent and then the child becomes the offspring. So we’ve implemented one of our operations, and once we’ve done that, obviously we can’t cross and mutate like we did before.

Now we only have one individual.

New optimization

At this point, we don’t need to change much, in fact, we just need to add and subtract some numbers from the original base, and let the point keep bouncing around.

This is a screenshot of a code given by Morvan Python. We can just do it this way, and again this algorithm is called ES or evolutionary strategy

Evolutionary strategy

There’s only one thing we need to focus on with this thing

That’s how to clean up. If you look at the previous code screenshot, there’s something called MUT_SURENGTH, that’s what it is. How do we define it?

And then in the code it looks like this.

Enter the loop directly to complete ~

Gradient descent optimization

Many of the materials used in this tutorial refer to Python, if you are interested, you can check it out. I will implement it myself and update the code, etc. (mainly as the Pytorch party later, I have to translate the content of TensorFlow party)

This may require a little more grounding in neural networks or PyTorch TensorFlow, but of course the optimization principle here is very simple. Is the use of gradient to intervene to purify, to purify the better direction to purify, evolution This principle to screening, one of the variants, we are unable to control in the direction of the cross, then introduce the gradient, we can change the value of X, to some extent by the X cross, variation in the direction of the proper control.

Here we mainly define the loss function on the original basis, using the optimizer. This is due to the TensorFlow code, I need to translate the translation using PyTorch rewrite. But the principle is really simple for those who have basic knowledge.

Similarly, here we can use evolutionary algorithms to optimize our neural network. The essence of neural network is to optimize each connection layer, activate some parameters of the function bias and parameter weight optimization. Using gradient descent to minimize the loss function, so what if we think of the loss function as a function to be optimized? Because our evolutionary algorithm or GA is very global, it can avoid the problem of local optimization. In machine learning, reinforcement learning is very important. As for the final design, what will be the impact of simple distributed hatred, or building a fun thing around CNN or NLP, or deep learning + data stream monitoring?