Heuristic algorithmsIt is an algorithm based on intuitionistic or experiential construction, which can give an approximate optimal solution to a specific optimization problem within the acceptable calculation cost (calculation time, space, etc.). The deviation degree between the approximate solution and the real optimal solution can not be predicted.

Because the general classical algorithm of NP problem solving efficiency is too low or even unable to solve, which promotes the birth of heuristic algorithm. A well-designed heuristic algorithm can usually get an approximate optimal solution to a problem in a relatively short time, and for NP problems it can also get an optimal solution in polynomial time.

A heuristic algorithm is not an exact algorithm, but rather provides a framework for finding optimal solutions. As can be seen from the definition at the beginning, heuristic algorithm needs to establish a series of heuristic rules suitable for practical problems according to the actual application scenarios. At the same time, the heuristic algorithm has the following problems:

  • At present, there is no unified and complete theoretical system.

  • All heuristic algorithms encounter the problem of local optimum. The difficulty lies in how to design an effective mechanism to escape from local optimum.

  • The parameter setting of the algorithm has a great influence on the effect. How to set the parameters effectively is worth thinking about.

  • How to set effective iteration stop conditions.

Therefore, it is worth noting that the heuristic algorithm can not guarantee the optimal solution, the effect is relatively unstable, its effect depends on the practical problem and the experience of the designer. But the flaws do not overshadow the flaws. Heuristic algorithm can solve complex problems in a relatively simple way, and it is easy to design programs.

Most heuristic algorithms have evolved through biological or physical phenomena, which we classify as follows:

  • Animal-like algorithms: particle swarm optimization algorithm, ant colony algorithm, fish swarm algorithm, bee colony algorithm, etc.

  • Plant like algorithm: phototropism algorithm, weed optimization algorithm;

  • Human imitation algorithm: genetic algorithm, neural network, acoustic search algorithm;

  • Simulation of physical phenomena algorithm: simulated annealing algorithm.

Next, the more classical heuristic algorithms in the above categories are selected for introduction.

Simulated annealing algorithm SA

In thermodynamics, annealing refers to the physical phenomenon of gradual cooling of an object. The lower the temperature, the lower the energy state of the object; When the energy state is low enough, the liquid begins to condense and crystallize. In the crystallized state, the energy state of the system is lowest and most stable.

The physical annealing process consists of three parts:

  • Heating process: The purpose is to enhance the thermal motion of the particle and remove it from its equilibrium position.

  • Isothermal process: For a closed system in which heat is exchanged with the surrounding environment and temperature is constant, the spontaneous change in the state of the system is always in the direction of free energy reduction.

  • Cooling process: the particle thermal motion is weakened, the system energy drops, and the energy reaches the lowest state.

Simulated annealing is a general heuristic algorithm. We imagine every solution in the problem search space as a molecule in the object. For each solution in the search space, it has “energy” just like the molecule of the object, which is used to indicate the suitability of the solution to the problem. The algorithm takes an arbitrary solution in the search space as the initial solution, randomly generates a new solution in each step, and calculates the probability of reaching the new solution from the current solution. Where, the heating process corresponds to the initial temperature of the algorithm, the isothermal process corresponds to the Metropolis sampling process of the algorithm, and the cooling process corresponds to the decrease of control parameters.

The Metropolis criterion refers to the global optimization ability of the algorithm to escape local extremum and avoid premature convergence by accepting the deteriorating solution with a certain probability.

The change of energy is the change of the value of the objective function, and the lowest state of energy is the optimal solution. The Metropolis criterion is the key for SA algorithm to converge to the global optimal solution. When a bad solution is found, the Metropolis criterion will accept the bad solution with a certain probability, enabling the algorithm to jump out of the local optimal. Assuming that the current feasible solution is X and the iteratively updated solution is X_new, the corresponding “energy difference” is defined as:

δ F = f (x_new) − F (x)

A bad solution is accepted with a certain probability, then the probability is:

P (δ f) = exp(- δ F /kT)

Temperature T is one of the influencing factors of Metropolis criterion: at high temperature, a new solution with large energy difference from the current state can be accepted; At low temperatures, only new solutions with smaller energy differences from the current state are accepted.

The specific flow of simulated annealing algorithm is shown in the figure below:

The simulated annealing algorithm is widely used and can solve NP-complete problems efficiently, such as traveling salesman problem, maximum cut problem, 0-1 knapsack problem and so on. However, its parameters are difficult to control and cannot be guaranteed to converge to the optimal value at one time. In most cases, it will still fall into the local optimal value, mainly affected by three key parameters:

  • Initial temperature setting is an important factor affecting the global search performance of simulated annealing algorithm. When the initial temperature is high, it is more likely to find the global optimal solution, but it takes a lot of calculation time. Otherwise, computing time can be saved, but global search performance may be affected.

  • The global search performance of simulated annealing algorithm is also closely related to annealing speed, that is, the number of iterations per temperature value. In general, a “sufficient” search at the same temperature is quite necessary, but the more “sufficient” the search is, the more computationally expensive it is.

  • Temperature management problem temperature management problem is also an important factor affecting the global search performance of simulated annealing algorithm. In practical application scenarios, temperature attenuation is often used for cooling due to calculation overhead:

    T = alpha x t. alpha ∈ (0, 1)

    In order to ensure a larger search space, α is generally taken as a value close to 1, such as 0.95 and 0.9.

🌰 for example: the problem of buying vegetables

Buy food problem is a typical combinatorial optimization problem: assuming that a given a set of contains n vegetables, each have their own vegetables v_i w_i volume and price, buy vegetables aunt has a capacity of C basket, aunt buy vegetables task is in no more than shopping basket capacity under the premise of to buy a set of highest prices of vegetables in the basket.


Given a set of n vegetables {x_1, x_2… , x_n}, x_I ∈{0, 1}, each vegetable X_I has the attribute {(w_i, v_i)}, and the shopping basket capacity is a positive integer C, that is, the following optimization problem is solved:

Assume that the volume set of vegetables size={2,2,3,4}, the price set of vegetables value={5,6,7,8} and the capacity of auntie’s shopping basket is 7. Generally, the initial scheme is S0 ={0,0,0,0}. Since it is to maximize the total price of vegetables, the objective function and limit function are:

If the aunt blindfolded and randomly selected a vegetable I each time, there are three ways to generate a new solution:

1. If vegetable I isn’t in the basket, just throw it in.


2. If vegetable I is not in the basket but the basket is full, take vegetable J blindfolded from the basket and give place to vegetable I;


3. If vegetable I is already in the basket, take vegetable I out first, then blindfold one vegetable J and put it in the basket.

According to the three new solutions, the value difference and volume difference of the shopping basket will appear in three situations.

Poor value of shopping basket:

Basket size difference:

Since the grocery shopping problem is a constrained optimization problem, the Metropolis criterion needs to make some adjustments accordingly:

Then, according to the calculation rules described above, set the initial temperature T, temperature attenuation coefficient α and iteration times K of each temperature, and an optimal buying scheme will be obtained: S = {1,1,1,0}. Totalsize=7, Totalvalue=18.


Genetic algorithm GA

Genetic algorithm uses Darwin’s theory of evolution and Mendel’s theory of heredity for reference, simulates the problem to be solved as a biological evolution process, generates the next generation of solutions through replication, crossover, mutation and other operations, and gradually eliminates the solutions with low fitness function value and increases the solutions with high fitness function value. So it’s very likely that N generations of evolution will lead to individuals with very high fitness functions.

Let’s start with some biological concepts related to genetics:

1. The evolution of living things in the form of groups, such a group is called a population;

1. An individual organism constituting a population;

1. A genetic factor;

1. A group of genes;

Survival competition, survival of the fittest: individuals with high environmental fitness have more opportunities to participate in reproduction, and their offspring will be more and more; Individuals with low fitness have fewer opportunities to participate in reproduction, resulting in fewer offspring;

Heredity and variation: The new individual inherits a portion of each parent’s genes and has a certain probability of genetic variation.

The corresponding genetic algorithm also has the following basic components:

  • coding

  • Fitness function

  • Genetic operators

  • Operation parameters



coding

Genetic algorithm abstracts objects into strings arranged in a certain order through some coding mechanism, which can be divided into the following types:

  • The simplest way of encoding is binary encoding, which is to encode the solution of the problem as a binary array;

  • Interchange coding is generally used to solve sorting problems, such as THE TSP problem, which uses a sequence of genetic codes to represent the order of traversal cities;

  • Tree coding is used for evolutionary programming in genetic programming.

  • Value coding is used to solve complex numerical problems.

Coding mainly follows three principles:

1. Completeness: all solutions of the problem space can be represented by coding rules;

2. Health: every gene corresponds to a possible solution;

3. Non-redundancy: the problem space corresponds to the expression space formed by coding rules.


Fitness function

The quality evaluation of an individual (solution) by genetic algorithm is measured by the fitness function value. The larger the fitness function value is, the better the quality of the individual is. Fitness function is the driving force of genetic algorithm evolution and the only criterion for natural selection, which needs to be designed according to the specific problem to be solved.

Generally speaking, the fitness function is positively correlated with the objective function, and the fitness function can be obtained by some deformation of the objective function.



Genetic operators

Genetic operators include selection operator, crossover operator and mutation operator.



Select operation

Selection operation refers to the survival of the fittest operation on individuals. Individuals with high fitness are more likely to be inherited into the next generation; Individuals with low fitness are less likely to be passed on to the next generation.

The selection operator in basic Genetic algorithm (SGA) adopts the selection method of roulette, and its basic idea is that the probability of each individual being selected is proportional to the value of its fitness function.

The realization steps of roulette selection method are as follows:

  • The fitness values of all individuals in the population were calculated;

  • Calculate the selection probability of each individual;

  • Calculate the accumulation probability;

  • Using simulated betting operation (that is, generating random numbers between 0 and 1, matching with the probability that each individual will be inherited to the next generation of population, in order to determine whether each individual will be inherited to the next generation of population.


🌰 for example

Assuming that there are chromosomes S1 = 3 (00011), S2 = 10 (01010), S3 = 17 (10001), S4 = 22 (10110), fitness of each chromosome can be obtained according to fitness function F(S) = S^ 2-2:

  • F(S1) = 3^2 -2 = 7

  • F(S2) = 10^2 -2 = 98

  • F(S3) = 17^2 -2 = 287

  • F(S4) = 22^2 -2 = 482

Suppose there are four random numbers r1=0.061, R2 =0.242, R3 =0.402, R4 =0.728; Combined with the fitness of each chromosome, it can be concluded that:



Crossover operation

Crossover operation refers to two mutually paired chromosomes exchanging some of their genes with each other in some way according to crossover probability Pc, thus forming two new individuals. Crossover operation is an important feature of genetic algorithm which is different from other evolutionary algorithms. It plays a key role in genetic algorithm and is the main method to generate new individuals. In basic genetic algorithm (SGA), crossover operator adopts single point crossover operator, besides, there are double point crossover and crossover based on “and/or”.

Single point crossing (binary coding) is the selection of a crossing point where the offspring’s genes in front of the crossing point are acquired from one parent gene and the subsequent portion from another parent gene.

Double-point crossover (binary coding) is the selection of two intersections where the portion of the offspring gene between the two intersections is taken from one parent gene and the remainder is taken from another parent gene.

Based on “and/or” crossover (binary coding), the two parent genes are bitwise “and”/” or “processing, to obtain the daughter genes.


Mutation operation

Mutation operation refers to changing some gene values in individual coding string according to mutation probability Pm to form a new individual. Mutation operation is an auxiliary method to generate new individuals, which determines the local search ability of genetic algorithm and maintains population diversity.

The cross operation and mutation operation cooperate with each other to complete the global search and local search of the search space. Basic bit mutation operator is used in SGA.

The basic bit mutation operator refers to a simple card flipping operation for one or more genes randomly designated by the individual coding string, that is, 1 becomes 0, 0 becomes 1. Whether to accept variation is determined by the probability of variation.


Operation parameters

The operating parameters include population size, crossover rate, mutation rate, maximum evolution algebra, etc.

Population size refers to the number of individuals in a population. A large population size cannot optimize the results of the genetic algorithm. The population size is recommended to be 15-30. The crossover rate should generally be large, generally using 85-95%. The variation rate should generally be relatively small, generally using 0.5% – 1%.

As shown in the figure above, the basic flow of genetic algorithm is as follows:

1. Initialize the evolution algebra counter (t=0), t is the maximum evolution algebra, and then randomly generate M individuals as the initial population P (t);

2. Individual evaluation: calculate the fitness value of each individual in P (t);

3. Selection operation: the selection operator is applied to the group;

4. Crossover operation: the crossover operator is applied to the group;

5. Mutation operation: the mutation operator is applied to the population;

6. Through steps 2-5, the next generation population P (t+1) is obtained;

7. The termination condition determines whether the evolution algebra reaches its maximum value: if t≦ t, then T ← T +1, go to Step 2; If t> t, terminate and output the solution.

Genetic algorithm has two performance optimization schemes: catastrophe and elitism.

Genetic algorithm has a strong local search ability, but it is easy to fall into the trap of local optimization. To jump out of the local extreme value, we must kill the current excellent individuals, so that the points far from the current extreme value have sufficient room for evolution, which is the idea of catastrophe. When crossover and variation are used to produce offspring, the optimal solution is likely to be lost in an intermediate step. When each offspring is generated, the current optimal solution is first copied into the offspring to prevent the optimal solution generated in the evolution process from being destroyed by crossover and variation. This is the elitism idea.

Catastrophe and elitism, there is a certain degree of contradiction between the two: the mechanism of catastrophe is to kill the excellent individuals produced; The mechanism of elitism is the preservation of good individuals to their offspring. But it is actually can coexist, both in each generation to cross before operation, the best individuals are copied to the next generation, but when the next n successive generations have appeared more excellent individuals, may be a genetic algorithm into local optimum, this time can use catastrophe mechanism, help to jump out of local optimal algorithm. Catastrophe and elitism can be selectively adopted when constructing genetic algorithms according to specific scenarios.


🌰 Go back to the example above: buying vegetables

Buy food problem is a typical combinatorial optimization problem: assuming that a given a set of contains n vegetables, each have their own vegetables v_i w_i volume and price, buy vegetables aunt has a capacity of C basket, aunt buy vegetables task is in no more than shopping basket capacity under the premise of to buy a set of highest prices of vegetables in the basket.

Given a set of n vegetables {x_1, x_2… , x_n}, x_I ∈{0, 1}, each vegetable X_I has the attribute {(w_i, v_i)}, and the shopping basket capacity is a positive integer C, that is, the following optimization problem is solved:

The auntie’s vegetable buying problem naturally conforms to binary code and will solve the set of N vegetables {x_1, x_2… X_n} is a binary chromosome of length N.

Assuming that the volume set of vegetables size={2,2,3,4}, the price set of vegetables value={5,6,7,8} and the capacity of auntie’s shopping basket is 10, the following population is randomly generated (n=4, population size is 4) : S1 = 0001, S2 = 0101, S3 = 1000, S4 = 1111.

To maximize the total price of vegetables, first calculate the total price of individuals, and then calculate the total volume of individuals:

Due to the limitation of capacity upper limit, this factor needs to be considered in the calculation of fitness, so the penalty term is added:

Where, the parameter α is the punishment coefficient, α > 1.0, the larger the value, the greater the punishment intensity.

Assuming that the penalty coefficient is 10, the fitness of each individual is:

S1 = 0001, TotalSize=4, TotalValue=8, Fitness=8;

S2 = 0101, TotalSize=6, TotalValue=14, Fitness=14;

S3 = 1000, TotalSize=2, TotalValue=5, Fitness=5;

S4 = 1111, TotalSize=11, TotalValue=26, Fitness=16

Suppose four random numbers are generated, r1=0.22, R2 =0.57, R3 =0.41, r4=0.79.

Then, the population of the next generation is T1 = 0101, T2 = 1000, T3 = 0101, T4 = 1111. Then the selected populations were randomly paired, and then the crossover points were randomly set according to the crossover rate. In the way of single-point crossover, part of genes of the paired chromosomes were exchanged.

Suppose T1 and T4 are paired, and the intersection is the second; T2 and T3 are paired, and the crossing point is in the third position, so the new population after the crossing is:

U1 = 0111, TotalSize=9, TotalValue=21, Fitness=21;

U2 = 1001, TotalSize=6, TotalValue=13, Fitness=13;

U3 = 0100, TotalSize=2, TotalValue=6, Fitness=6;

U4 = 1101, TotalSize=8, TotalValue=19, Fitness=19.

Again, assuming that U3 is mutated in the third and fourth positions, then:

U1 = 0111, TotalSize=9, TotalValue=21, Fitness=21;

U2 = 1001, TotalSize=6, TotalValue=13, Fitness=13;

U3 = 0111, TotalSize=9, TotalValue=21, Fitness=21;

U4 = 1101, TotalSize=8, TotalValue=19, Fitness=19.

So far, auntie has found the best plan to buy vegetables U1 and U3.


🌰 Another example: travel problem (if you understand, you can fast skate through ~)

Auntie set out from home and visited all the famous cities of the motherland before returning home. This is another classic combinatorial optimization problem, this time the task is to choose a route, so that the total journey of the tour is the shortest.

As shown in the table above, given a numeric code for each city, a route (i.e. a chromosome) is represented by an array of N city codes, the order of the elements in the array indicates the order of the trip, and the elements in the array are not repeated, since a city is only visited once.

Suppose S1 = {17, 3, 21… , 30}, indicating the order of travel: Zhengzhou -> Shanghai -> Fuzhou ->… – > in chengdu.

The shorter the aunt’s total trip, the better, then take the reciprocal of the total trip of a route, as a fitness function. If the east longitude is x and the north latitude is y, then the coordinate of a city is (x, y), then a route S_i = {(x_1, y_1), (x_2, y_2),… , (x_n, y_n)}, I ∈[0, m], so:

Here again, the roulette wheel is used. The routes are then randomly paired and the crossing points are randomly selected according to the crossing rate. As for the path sequence, the single point crossing method can not be used to simply swap some genes of the parent chromosome, because it is easy to cause repeated city codes in the offspring chromosome. Get the city codes of the intersections from the father, keep these codes sequential in the father and populate the heads of the children, and the rest of the city codes are retrieved from the mother and populate the children. Such as:

Father: {1,2,3,4,5,6… }

Mother: {31,2,3,11,1,5… }

Children: {1,3,4,5,6… , 31,2,11… }

The mutation method adopted here is to randomly select codes of two cities in a route according to the mutation rate, and then exchange the positions of these two codes. For example, S = {1,3,4,5,6… , 31,2,11… }, then T = {1,11,4,5,6… , 31,2,3… }.

At this point, aunt can according to the genetic algorithm to find the shortest total travel route ~


Evolutionary strategy ES

Evolution Strategy is a method to solve parameter optimization problems. It imitates the principle of biological Evolution and assumes that no matter what changes take place in genes, the results (traits) always follow the gaussian distribution of zero mean and a certain variance. The characteristics of evolutionary strategy are as follows:

  • The selection in evolutionary strategy is carried out in a deterministic way, which is different from random selection in genetic algorithm.

  • The recombination operator in the evolutionary strategy is different from the crossover in the genetic algorithm. It does not simply exchange some part of the individual, but makes every member of the individual change.

Evolutionary strategies can be divided into two categories, (μ+λ) -es and (μ,λ) -es.


(mu + lambda) – ES

Each iteration generates λ new solutions. By comparing with the parent, the better μ solution becomes the parent of the next iteration, and the others are discarded directly.

This method introduces the idea of population, easy to parallelize, but easy to fall into the local optimal trap, mainly used in multi-objective optimization.


(mu, lambda) – ES

Each iteration produces λ new solutions (λ>μ), among which the better μ becomes the parent of the next iteration, and the others are discarded directly. In this way, all solutions survive only one generation, which can better avoid falling into local optimum.

(μ+λ) selection can ensure the survival of the optimal individuals and make the evolution of the population show a monotony upward trend, but (μ+λ) selection to preserve the old individuals is easy to bring about local optimization problems. (μ, λ) -es is usually better than (μ+λ) -es, which is the mainstream of current evolutionary strategy.


The DNA of evolutionary strategy is no longer expressed in binary, but is replaced by real numbers, which can solve many practical problems composed of real numbers. Evolutionary strategies produce offspring crossing genes, similar to genetic algorithms, but how should genetic variation be handled? Because genes are real numbers, it is impossible to do a simple flip like genetic algorithms.

Genetic variation in evolutionary strategy is determined by variation intensity, and normal distribution plays a key role here.

We take the values of the genes inherited from the parents as the average of the normal distribution, add one standard deviation to that average, and then determine a normal distribution, and use that normal distribution to produce a number. For example, if the intensity of variation at 8.8 is 2.5, a normal distribution is determined based on the standard deviation of 2.5 and the mean of 8.8, and a new value of 8.7 is generated.

The values on each of the progeny genes are mutated through different state distributions, resulting in entirely new progeny DNA. So, intensity of variation can also be thought of as a set of genetic information that is passed down from the father’s DNA, and intensity of variation can mutate itself.

The key steps of evolutionary strategy are: crossover, variation, selection and variation degree.

Like genetic algorithms, crossover involves swapping genes between two individuals in three main ways:

1. Discrete reorganization: firstly, two parent individuals are randomly selected, and then their components are randomly exchanged to form each component of the new individual of the offspring, so as to obtain the new individual.

2. Median recombination: in this method, two parents are randomly selected first, and then the average value of each component of the parent is taken as the component of the new offspring to form the new individual.

3. Hybrid reorganization: this kind of reorganization is characterized by the selection of paternal individuals. In the hybrid recombination, a fixed parent was randomly selected first, and then the second parent was randomly selected from the parent population for each component of the offspring. That is, the second paternal individual is constantly changing. As for the combination of the two individuals in the parent generation, the discrete method can be used, or the median method can be used.

Variation is relatively simple, which is to add zero mean value and the change of gaussian distribution of some variance above each component to produce new individuals. This certain variance is the degree of variation, and the degree of variation will change. At the beginning of the algorithm, the degree of variation is relatively large, but when it is close to convergence, the degree of variation will start to decrease.

Then, a specific case of evolutionary strategy is used to understand the variation degree formula in (1 + 1) -es, as shown in the figure below.

The degree of variation is controlled by the 1/5 success rule. It increases the degree of variation before convergence and decreases the degree of variation when convergence is near. The condition for determining convergence is that if only 1/5 of the variation is better than the original parent, it is fast convergence; If half of the variation is better than the original parent, it hasn’t converged yet.