Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Introduction to genetic Algorithms

Genetic algorithm is a series of search algorithms inspired by the theory of natural evolution. By mimicking the processes of natural selection and reproduction, genetic algorithms can provide high-quality solutions to a variety of problems involving search, optimization, and learning. At the same time, they resemble natural evolution, so they can overcome some of the obstacles encountered by traditional search and optimization algorithms, especially for problems with a large number of parameters and complex mathematical representations.

Analogical Darwinism

Darwinian evolution theory

Genetic algorithms are simplified versions of Darwinian evolution implementations that mimic nature. The principle of Darwinian evolution is summarized as follows:

  1. Variation: The characteristics (traits, attributes) of individual samples in a population may differ, causing samples to differ to some extent from each other.
  2. Heredity: Certain traits can be passed on to their offspring. As a result, offspring and parental samples have a certain degree of similarity.
  3. Selection: Populations usually compete for resources in a given environment. Better-adapted individuals have an advantage in survival and therefore produce more offspring.

In other words, evolution maintains that samples of individuals in a population differ from each other. Those that adapt are more likely to survive, reproduce and pass on their traits to the next generation. Thus, as generations pass, species become more adapted to their environment. An important driver of evolution is crossover, or recombination, or hybridization — combining the characteristics of both parents to produce offspring. Crossover helps maintain the diversity of the population and melds better features together over time. In addition, mutations, or random mutations of features, can play an important role in evolution by introducing accidental changes.

Genetic algorithms correspond to concepts

Genetic algorithms try to find the best solution to a given problem. While Darwinian evolution preserves the individuals of a population, genetic algorithms retain a set of candidate solutions (also known as individuals) for a given problem. These candidate solutions are evaluated iteratively and used to create the next generation solution. A better solution has a greater chance of being selected and its characteristics passed on to the next generation of candidate solution sets. In this way, with generational updates, the set of candidate solutions can better solve the current problem. In nature, a Genotype characterizes reproduction, reproduction and mutation. A Genotype is a collection of genes that make up a chromosome. In genetic algorithms, each individual is made up of chromosomes that represent a collection of genes. For example, a chromosome can be represented as a binary string, where each bit represents a gene:

PopulationGenetic algorithms keep a large number of individuals — a set of candidate solutions to the problem at hand. Because each individual is represented by a chromosome, the individuals of these races can be regarded as a set of chromosomes:

In each iteration of the algorithm, the Fitness function (also known as the objective function) is used to evaluate individuals. An objective function is a function used for optimization or a problem to be solved. Individuals with higher fitness scores represent better solutions, are more likely to be selected for reproduction and their traits will be expressed in the next generation. With the progress of genetic algorithm, the quality of solution will be improved and the fitness will increase. Once the solution with satisfactory fitness value is found, the genetic algorithm will be terminated.

Selection After the fitness of each individual in a population has been calculated, a Selection process is used to determine which individuals in the population will be used to reproduce and produce the next generation, with individuals with higher values more likely to be selected and pass on their genetic material to the next generation. There is still a chance to select individuals with low fitness values, but the probability is low. That way, its genetic material is not completely discarded. In order to create a new pair of individuals, it is common to swap (Crossover) portions of a parent sample selected from the current generation to create two new chromosomes representing the offspring. This operation is called crossing or regrouping:

The purpose of Mutation operations is to periodically and randomly update populations, introduce new patterns into chromosomes, and encourage searches in unknown regions of the solution space. Mutations can manifest as random changes in genes. Variation is achieved by randomly changing the value of one or more chromosomes. For example, flipping a bit in a binary string:

Genetic algorithm theory

The theory of constructing genetic algorithms assumes that the optimal solution to the current problem is composed of many elements, and when more such elements are combined, they will be closer to the optimal solution of the problem. Individuals in a population contain some of the elements required for optimal solutions, and repeated selection and crossover processes allow individuals to convey these elements to the next generation and possibly combine them with other essential elements of optimal solutions. This creates genetic pressure that leads more and more individuals in the population to contain the elements that make up the best solution.

Schema Theorem

A more formal expression of the factor hypothesis is the Holland schema theorem, also known as the fundamental theorem of genetic algorithms. The theorem states that a schema is a pattern (or template) that can be found within chromosomes. Each schema represents a subset of chromosomes with certain similarities. For example, if a set of chromosomes is represented by a binary string of length 4, schema 101 represents all of these chromosomes, where the leftmost position is 1, the two rightmost positions are 01, and the second position from the left is either 1 or 0, where the wildcard character is represented. For each schema, there are two measures:

  1. Order: The number of fixed numbers
  2. Defining length: The distance between the most distant fixed number

The following table provides several examples of four-digit binary schemas and their metrics:

schema Order Defining Length
1101 4 3
1 * 01 3 3
* 101 3 2
* * 01 2 1
1 * * * 1 0
* * * * 0 0

Each chromosome in the population corresponds to multiple schemas. For example, chromosome 1101 corresponds to each schema that appears in the table because it matches each pattern they represent. If the chromosome has a high fitness, it and all the schemas it represents are more likely to survive the selection operation. When this chromosome crosses or mutates with another chromosome, some schemas remain and others disappear. Low-order schemas and schemas with short definition lengths are more likely to survive. Thus, the schema theorem states that the frequency of low-order schemas with short definition length and higher-than-average fitness increases exponentially over successive generations. In other words, as genetic algorithms evolve, smaller, simpler base blocks of elements that represent attributes of more efficient solutions will increasingly be present in populations.

The difference between genetic algorithm and traditional algorithm

There are some important differences between genetic algorithms and traditional search and optimization algorithms, such as gradient-based algorithms.

  1. Based on the population

The genetic search was conducted for a group of candidate solutions (individuals) rather than a single candidate. During the search, the algorithm preserves a set of individuals from the current generation. Each iteration of the genetic algorithm creates the next generation of individuals. Instead, most other search algorithms maintain a single solution and iteratively modify it to find the best solution. For example, the gradient descent algorithm iterates the current solution along the current steepest descent direction, which is the negative of the gradient of a given function. Genetic algorithms do not run on candidate solutions directly, but on their representations (or codes) (commonly called chromosomes, samples). Chromosomes can take advantage of genetic manipulation of crossover and mutation. The disadvantage of using genetic representation is that it separates the search process from the original problem domain. Genetic algorithms do not know what chromosomes stand for and do not attempt to explain them. 3. Fitness function Fitness function represents the problem to be solved. The purpose of genetic algorithm is to find the individual with the highest score obtained by fitness function. Unlike many traditional search algorithms, genetic algorithms only consider the value obtained by using the fitness function and do not rely on the derivative or any other information. This makes them suitable for dealing with functions that are difficult or impossible to differentiate mathematically. While many traditional algorithms are deterministic in nature, the rules that genetic algorithms use to produce from one generation to the next are probabilistic. For example, the selected individual will be used to create the next generation, and the probability of selecting an individual increases with the individual’s fitness score, but it is still possible to select an individual with a lower score. Although this process is probabilistic, the search based on genetic algorithm is not random. Instead, it uses randomness to direct searches to areas of the search space that have a better chance of improving results.

Advantages and disadvantages of genetic algorithms

advantages

  1. The global optimal

In many cases, optimization problems have local maxima and minima. These values represent solutions that are better than those around them, but not optimal. Most traditional search and optimization algorithms, especially those based on gradients, can easily fall into local maxima instead of finding global maxima. Genetic algorithms are more likely to find global maxima. This is due to the use of a set of candidate solutions rather than a single candidate solution, and in many cases, crossover and mutation operations will cause the candidate solution to differ from the previous solution. As long as we try to maintain the diversity of the population and avoid premature convergence, a global optimal solution may be produced. Since genetic algorithms only require fitness function scores for each individual and are independent of other aspects of fitness functions (such as derivatives), they can be used to solve function problems with complex mathematical representations that are difficult or impossible to differentiate. 3. Dealing with problems that lack mathematical representation Genetic algorithms can be used for problems that lack mathematical representation at all. This is because fitness is artificial. For example, to find the most attractive color combinations, try different color combinations and ask users to rate their attractiveness. Using opinion-based scores as fitness functions, genetic algorithms are applied to search for the best combination of scores. Even if the fitness function lacks a mathematical representation and the score cannot be computed directly from a given color combination, the genetic algorithm can still be run. Genetic algorithms can even handle cases where fitness cannot be obtained for each individual, as long as it is possible to compare two individuals and determine which one is better. For example, using a machine learning algorithm to drive a car in a simulation race, and then using a genetic algorithm-based search can optimize and tune the machine learning algorithm by pitting different versions of the algorithm against each other to determine which one is better. Noise resistance Some problems may exist in the phenomenon of noise. This means that even for similar input values, the output may be different each time. This can happen, for example, when abnormal data is generated from sensors, or when scores are based on a person’s point of view. Although this behavior can interfere with many traditional search algorithms, genetic algorithms are generally robust to it, thanks to repeated crossover and reevaluation of individual operations. 5. Parallelism Genetic algorithm is very suitable for parallelization and distributed processing. Fitness is calculated individually for each individual, meaning that all individuals in the population can be assessed at the same time. In addition, selection, crossover, and mutation can be performed simultaneously on individuals and pairs of individuals in a population, respectively. 6. Continuous learning Evolution never ends, as populations adapt to changing environmental conditions. Genetic algorithms can operate continuously in a constantly changing environment and can obtain and use the current best solution at any point in time. However, the speed of environment change is slower than the search speed of genetic algorithm.

limitations

  1. Special definition required: When genetic algorithms are applied to a given problem, appropriate representations need to be created for them — fitness functions and chromosome structures are defined, along with selection, crossover, and mutation operators suitable for the problem.

  2. Hyperparametric tuning: The behavior of a genetic algorithm is controlled by a set of hyperparameters, such as population size and mutation rate. There are no standard rules for setting hyperparameters when applying genetic algorithms to specific problems.

  3. Computation-intensive: Large populations may require a lot of computation and can be time consuming until good results are achieved. These problems can be mitigated by choosing hyperparameters, parallel processing, and in some cases caching intermediate results.

  4. Premature convergence: If an individual is much more adaptable than the rest of the population, it may be reproducible enough to cover the entire population. This can cause genetic algorithms to fall into local maxima prematurely instead of finding global maxima. To prevent this, diversity of species needs to be ensured.

  5. Quality of solutions that are not guaranteed: The use of genetic algorithms does not guarantee finding a global maximum for the current problem (but almost all search and optimization algorithms do, unless it is an analytical solution for a particular type of problem). In general, genetic algorithms can provide a good solution in a reasonable amount of time when used properly.

Application scenarios of genetic algorithms

  1. Mathematical representation of complex problems: Since genetic algorithms only require the results of fitness functions, they can be used for objective function problems that are difficult or impossible to differentiate, large number of parameter problems, and parameter mixed problem types.
  2. Problems without mathematical expressions: Genetic algorithms do not require a mathematical representation of the problem as long as a score value can be obtained or there is a way to compare two solutions.
  3. Problems involving noisy data: Genetic algorithms can deal with problems where data may be inconsistent, such as data derived from sensor output or based on human ratings.
  4. Issues involved in time-varying environments: Genetic algorithms can respond to slower environmental changes by constantly creating new generations that adapt to change.

But when the problem has known and specialized solutions, using existing traditional methods or analytical methods may be a more effective choice.

Components of genetic algorithms

At the heart of genetic algorithms are loops — genetic operators of selection, crossover and mutation applied in turn, and individuals reevaluated — that continue until a stop condition is met

The basic flow of the algorithm

The following flow chart shows the main stages of the basic genetic algorithm flow:

Graph TD id1([start]) --> ID2 [create initial population] ID2 --> ID3 [calculate fitness of each individual in population] ID3 --> ID4 [select] ID4 --> ID5 [cross] ID5 --> ID6 [mutation] ID6 --> Id7 [calculate the fitness of each individual in the population] ID7 --> ID8 ID8 {Meet the stop condition? } - > | no | id4 id8 - > | yes | id9 / choose the highest fitness individuals id9 - > id10 ([over])

Create an initial population

The initial population is a randomly selected set of valid candidate solutions (individuals). Since genetic algorithms use chromosomes to represent each individual, the initial population is actually a set of chromosomes.

Calculated fitness

The fitness function value is calculated for each individual. This operation will be performed once for the initial population and then again for each new generation after applying genetic operators of selection, crossover, and mutation. Since the fitness of each individual is independent of the others, it can be calculated in parallel. Since the selection phase after fitness calculation usually considers the individual with a higher fitness score as a better solution, genetic algorithms focus on finding the maximum fitness score. If a minimum is required, the fitness calculation should invert the original value, for example, by multiplying it by the value (-1).

Selection, crossover and variation

Applying genetic operators of selection, crossover and mutation to a population produces a new generation based on the better individuals of the current generation. The selection operation is responsible for the selection of dominant individuals in the current population. Crossover (or recombination, recombination) operations create offspring from selected individuals. This is usually done by two selected individuals swapping parts of their chromosomes to create two new chromosomes representing the offspring. A mutation operation can randomly change one or more chromosomal values (genes) of each newly created individual. Mutations usually occur at very low rates.

Algorithm termination condition

There are a number of possible conditions that can be checked to determine whether an algorithm can be stopped. The two most commonly used stop conditions are:

  1. The maximum number of generations has been reached. This is also used to limit the running time and computing resources consumed by the algorithm.
  2. Individuals have not improved significantly in the past few generations. This can be done by storing the best fitness value obtained for each generation and then comparing the current best value to a predetermined best value obtained several generations earlier. If the difference is less than a certain threshold, the algorithm can be stopped.

Other stop conditions:

  1. The scheduled time has been exceeded since the start of the algorithm process.
  2. Consumes a certain cost or budget, such as CPU time and/or memory.
  3. The best solution has taken over a portion of the population that is larger than the preset threshold.

other

Elitism

Although the average fitness of a genetic algorithm population generally increases with generations, it is possible to lose the best contemporary individual at any time. This is because the selection, crossover, and mutation operators change individuals in the process of creating the next generation. In many cases, the loss is temporary, as these individuals (or better ones) will be reintroduced into the population in the next generation. But to ensure that the best individuals always make it into the next generation, an elitist strategy can be used. This means that the first n individuals (n is a predefined parameter) are copied to the next generation before we populate the population with offspring created through selection, crossover, and mutation. The cloned elite individual is still eligible to participate in the selection process and thus can still be used as the parent of the new individual. Elitism’s strategy can sometimes have a significant positive impact on the performance of algorithms, as it avoids the potential time wasted in rediscovering good solutions lost in the genetic process.

Niches and sharing

In nature, any environment can be further divided into subenvironments or niches consisting of a variety of species that make use of the unique resources available within each niche, such as food and shelter. For example, the forest environment consists of treetops, shrubs, forest floors, roots, etc. These can accommodate different species that live in the niche and use its resources. When several different species coexist in the same niche, they all compete for the same resources, creating a tendency to find new, uninhabited ecological niches and fill them. In the field of genetic algorithms, this niche phenomenon can be used to maintain the diversity of populations and to find several optimal solutions, each of which is considered a niche. For example, suppose the genetic algorithm tries to maximize a fitness function with several different peaks:

Since the tendency of genetic algorithms is to find the global maximum, after a period of time most individuals are concentrated near the peak. This is indicated in the diagram by the placement of the x marker, which represents the individual of the current generation. But sometimes we want to find some other partial (or total) peak in addition to the global maximum. To this end, each peak can be treated as a niche, providing resources in proportion to its height. Then, find a way to share (or distribute) those resources among the individuals occupying them, which ideally will encourage a corresponding distribution of species, with the highest summit attracting the most people because it offers the most rewards, and the other peaks correspondingly reducing the number of species because it offers fewer rewards:

One way to implement the sharing mechanism is to divide the original fitness value of each individual by the sum of the distances of all other individuals (related to it). Another option is to divide the original fitness of each individual by the number of other individuals within a surrounding radius.

Genetic algorithm practice

Using DEAP framework to realize the genetic algorithm of Hello World — OneMax problem, code link.