Parallel computation is performed through MongoDB during search
Font Tian translated this article on 23 December 2017
Hyperopt is designed to support different types of Trials databases. The default Trials database is implemented with Python lists and dictionaries. But while the default implementation is easy to work with, it does not support the asynchronous updates needed for parallel evaluation trials. Never mind, for parallel searches, HyperOpt also includes a MongoTrials to support asynchronous updates.
So to do a parallel search, you need to do the following (after installing mongodb) :
- Start a Mongod process from a location that the running computer can link to, either locally or remotely.
- Modify your call to
hyperopt.fmin
Connected to themongod
The process ofMongoTrials
The module. - Start one or more
hyperopt-mongo-worker
Connect to themongod
The process of the process, and infmin
Blocks search.
1. Start a Mongod process
Starting a database process (Mongod) after installing mongodb is as simple as following. (Of course, you can also refer to other documents.)
mongod --dbpath . --port 1234
# or storing each db its own directory is nice:
mongod --dbpath . --port 1234 --directoryperdb --journal --nohttpinterface
# or consider starting mongod as a daemon:
mongod --dbpath . --port 1234 --directoryperdb --fork --journal --logpath log.log --nohttpinterface
Copy the code
Mongo pre-allocates a few GB of space (which you can disable with –noprealloc) for better performance, so think about this when you want to create the database location. Creating a database remotely may not only have a poor performance experience for your database, but also for others on the network, so be careful.
Alternatively, if your machine is connected to the Internet, you can either bind to the Loopback interface and connect over SSH, or read the mongodb documentation on password protection.
The rest of the tutorial is based on a local host running mongo port 1234.
2. UseMongoTrials
To demonstrate the program, we use the math.sin function and then minimize it with hyperopt. The following is an example:
import math
from hyperopt import fmin, tpe, hp
from hyperopt.mongoexp import MongoTrials
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1')
best = fmin(math.sin, hp.uniform('x', -2, 2), trials=trials, algo=tpe.suggest, max_evals=10)
Copy the code
The first parameter to MongoTrials is which Mongod process to use and which database is in that process (‘foo_db’ in this case). The second parameter (exp_key=’exp_1′) is used to mark an experiment (so that multiple experiments can be saved in a database) and is optional.
Note that there is currently a requirement that the database name must be followed by the “/ jobs” field.
Either putting your experiments in a separate database or using the EXP_key mechanism to separate them depends on your code and is possible. The benefit of using separate databases is that you can manipulate them from the shell (they appear as separate files) and ensure that experiments have better independence/isolation. Benefits of using exp_key: The Hyperopt-Mongo-worker process (see below) polls at the database level so that multiple tests can be done in the same database.
3. Run hyperopt – mongo – worker
If you run the code snippet above, you will see that it blocks (suspends) when fmin is called. Inside MongoTrials, -fmin acts as an asynchronous Trials object, so FMIN doesn’t actually evaluate the metric function when suggesting new search points. Instead, it just sits there, patiently waiting for another process to do the job, and then updates MongoDB’s results. The Hyperopt-Mongo-worker script contained in the bin directory is written for this purpose. When you install Hyperopt, make sure it should be added to your $PATH (environment variable).
You can also open a new shell and type while calling fmin in the script above and blocking
Hyperopt - mongo - worker - the mongo = localhost: 1234 / foo_db - poll - interval = 0.1Copy the code
It takes a work item from MongoDB, evaluates the math.sin function, and stores the results back to the database. After the fmin function has run all The Times it passed in, it terminates the script and returns the calculated results. The Hyperopt-Mongo-worker script then waits a few minutes for more work to appear and finally terminates.
In this case, it is best to explicitly set the polling interval that is appropriate for our program, and the default time is set for jobs (search point evaluation) that take at least a minute or two to complete.
MongoTrials
Is a persistent object
If you run the example again,
best = fmin(math.sin, hp.uniform('x', -2, 2), trials=trials, algo=tpe.suggest, max_evals=10)
Copy the code
You’ll see it immediately return, seemingly without any calculation. That’s because the database you’re connecting to has had enough trials before, and by default the program simply fetches the original results; You actually calculated them when you ran the first experiment. If you want to do a new search, you can change the database name or exp_key. If you want to extend the search, you can set an Fmin with a higher number for max_EVals.
Alternatively, you can start another process dedicated to creating MongoTrials to analyze the results used in the database. Those other processes don’t need to call Fmin at all.