preface
In this paper, the author uses MPI’s Python interface MPI4py to accelerate his own genetic algorithm framework GAFT for multi-process parallel acceleration. The acceleration effect is tested simply.
Project links:
- GitHub: github.com/PytLab/gaft
- PyPI: pypi.python.org/pypi/gaft
The body of the
When we use genetic algorithm to optimize the objective function, the function is usually a high-dimensional function, and its derivative is generally difficult to get. In this way, our fitness function calculation is usually time-consuming.
For example, when genetic algorithm is used to find the optimal structure, it is usually necessary to call quantization software to calculate the total energy of the structure based on first principles, which is a very time-consuming process. For example, when we optimize the parameters of the force field, we take the error between the energy calculated by the force field and the reference energy as the fitness, and we need to call the corresponding force field program to obtain the total energy, which is also relatively time-consuming.
This leads to the problem that when we have large populations, we need to use fitness information to produce the next generation, and each generation takes a long time to reproduce. Fortunately, the selection crossover and mutation process of a population is independent of each other for individuals in the population. We can process this part in parallel to speed up the iteration of genetic algorithm.
Use the mpi4py
Since the clusters in the lab are all MPI environments, I chose to use the MPI interface to parallelize the code. Here, I used the Python version of the MPI interface mPI4py to parallelize the code. I wrote a blog post about the use of MPI4py. See “Python Multi-process Parallel Programming Practices: Using MPI4py.”
The mPI4py interface is further encapsulated
In order to make mpI interface more convenient to call in GAFT, I decided to further encapsulate mPI4py for the places needed in genetic algorithm. For this purpose, I wrote a separate MPIUtil class, refer to GAFT/MPIUtil.
Encapsulates the interfaces commonly used by communicators
For example, process synchronization, obtaining rank, number of processes, and determining whether the process is the master process.
class MPIUtil(object):
def __init__(self):
logger_name = 'gaft.{}'.format(self.__class__.__name__)
self._logger = logging.getLogger(logger_name)
# Wrapper for common MPI interfaces.
def barrier(self):
if MPI_INSTALLED:
mpi_comm = MPI.COMM_WORLD
mpi_comm.barrier()
@property
def rank(self):
if MPI_INSTALLED:
mpi_comm = MPI.COMM_WORLD
return mpi_comm.Get_rank()
else:
return 0
@property
def size(self):
if MPI_INSTALLED:
mpi_comm = MPI.COMM_WORLD
return mpi_comm.Get_size()
else:
return 1
@property
def is_master(self):
return self.rank == 0Copy the code
Intra-group collection communication interface
Since the task of parallelization was carried out at the time of population reproduction, I needed to divide the previous generation population into multiple sub-parts, and then carry out genetic operations such as selection, crossover and mutation for the sub-parts in each process. Finally, the subpopulations obtained from each word part were collected and merged. Several interfaces for partitioning and collecting are written for this purpose:
def split_seq(self, sequence):
''' Split the sequence according to rank and processor number. '''
starts = [i for i in range(0, len(sequence), len(sequence)//self.size)]
ends = starts[1: ] + [len(sequence)]
start, end = list(zip(starts, ends))[self.rank]
return sequence[start: end]
def split_size(self, size):
''' Split a size number(int) to sub-size number. '''
if size < self.size:
warn_msg = ('Splitting size({}) is smaller than process ' +
'number({}), more processor would be ' +
'superflous').format(size, self.size)
self._logger.warning(warn_msg)
splited_sizes = [1]*size + [0]*(self.size - size)
elifsize % self.size ! =0:
residual = size % self.size
splited_sizes = [size // self.size]*self.size
for i in range(residual):
splited_sizes[i] += 1
else:
splited_sizes = [size // self.size]*self.size
return splited_sizes[self.rank]
def merge_seq(self, seq):
''' Gather data in sub-process to root process. '''
if self.size == 1:
return seq
mpi_comm = MPI.COMM_WORLD
merged_seq= mpi_comm.allgather(seq)
return list(chain(*merged_seq))Copy the code
Decorator used to restrict the execution of a program in the main process
Some functions, such as logging and data collection functions, I only want to be executed in the main process. For convenience, I wrote a decorator to restrict the function execution in the main process:
def master_only(func):
''' Decorator to limit a function to be called only in master process in MPI env. '''
@wraps(func)
def _call_in_master_proc(*args, **kwargs):
if mpi.is_master:
return func(*args, **kwargs)
return _call_in_master_procCopy the code
Add parallelism to the genetic algorithm main loop
In the process of population reproduction, the population is divided according to the number of processes, and then the genetic operation is carried out in parallel and the subpopulation is merged to complete the parallelism, with few code changes. See: github.com/PytLab/gaft…
# Enter evolution iteration.
for g in range(ng):
# Scatter jobs to all processes.
local_indvs = []
local_size = mpi.split_size(self.population.size // 2)
# Fill the new population.
for _ in range(local_size):
# Select father and mother.
parents = self.selection.select(self.population, fitness=self.fitness)
# Crossover.
children = self.crossover.cross(*parents)
# Mutation.
children = [self.mutation.mutate(child) for child in children]
# Collect children.
local_indvs.extend(children)
# Gather individuals from all processes.
indvs = mpi.merge_seq(local_indvs)
# The next generation.
self.population.individuals = indvsCopy the code
Test acceleration
Test one-dimensional search
Now I run a parallel acceleration test for an example of one-dimensional optimization in the project to see the effect of acceleration. The example code is in /examples/ex01/
Due to the limited number of sub-cores in my book, I installed GAFT on the laboratory cluster and used MPI to conduct parallel calculation with multiple cores for optimization. The population size was 50, and the algebra was 100. Different optimization time and acceleration ratio could be obtained according to different core numbers. The visualization is as follows:
The core number | Optimization time (s) | Speed up than |
---|---|---|
1 | 1.473 | 1.0 |
2 | 0.877 | 1.68 |
3 | 0.657 | 2.24 |
4 | 0.533 | 2.76 |
5 | 0.467 | 3.15 |
6 | 0.540 | 2.73 |
7 | 0.431 | 3.42 |
8 | 0.382 | 3.86 |
9 | 0.355 | 4.15 |
10 | 0.317 | 4.65 |
The relationship between core number and optimization time:
Core number and acceleration ratio:
Optimization of test field
Here, I conduct accelerated tests on the objects I want to study. This part of the code is not open source, and the fitness calculation for each individual needs to call other calculation programs, so this process takes a lot of time compared with the objective function calculation with function expression directly.
Also, I’ll look at the effects of using MPI on cluster acceleration for different core numbers:
The core number | Optimization time (s) | Optimization of time | Speed up than |
---|---|---|---|
1 | 2.29 e04 | 6 h 21 min | 1.0 |
2 | 1.94 e04 | 5 h 23 min | 1.18 |
4 | 1.62 e04 | 4 h 30 min | 1.41 |
6 | 1.35 e04 | 3 h 45 min | 1.69 |
12 | 8.73 e03 | 2 h 25 min | 2.62 |
The relationship between core number and optimization time:
Core number and acceleration ratio:
It can be seen that MPI is ideal for the acceleration of genetic algorithm in the above two cases, and the program can be thrown on the cluster to fly
conclusion
This paper mainly summarizes the method and process of using MPI4PY to parallelize genetic algorithm, and tests the acceleration effect. It can be seen that THE acceleration effect of MPI for genetic algorithm framework GAFT is relatively ideal. The genetic algorithm framework with MPI parallelism has been updated and uploaded to GitHub(github.com/PytLab/gaft) []~(~ ▽ ~)~*