FMin

Font Tian translated this article on 22 December 2017

This page is the basic tutorial on Hyperopt.fmin (). It mainly describes how to write a function that can be optimized using FMIN, and how to describe the search space of FMIN.

The job of Hyperopt is to find the scalar value, the best value for the possibility-stochastic function, through a set of possible parameters (note that the stochastic and random are not exactly the same in mathematics). While many optimization packages assume that these inputs are extracted from the vector space, Hyperopt is different in that it encourages you to describe the search space in more detail. You can make the algorithm in Hyperopt search more efficiently by providing more information about where the function is defined and where you think the best value is.

Summary of the process of using hyperopt:

  • The objective function for minimization
  • The search space
  • A database that stores all point evaluations of searches
  • The search algorithm to use

This (basic) tutorial will show you how to write functions and search Spaces, using the default Trials database and the pseudo-Random search algorithm. Part (1) is about the different calling conventions for communication between the object function and Hyperopt. Part (2) describes the search space.

MongoTrials instead of Trials for parallel search; In addition, how to use MongDB to parallel all wiki pages

We can choose algo= Hyperopt.tpe. Suggest instead of algo= Hyperopt.random. Suggest to select the search algorithm. In fact, the search algorithm is a callable object, and its constructor also passes in configuration parameters. However, that’s pretty much all there is to choosing a search algorithm.

1. Define a function to minimize

Hyperopt provides some options for increased flexibility/complexity when it comes to specifying a minimal objective function. As an objective function designer might consider:

  • In addition to the function return value, do you need to save additional information, such as additional statistics and diagnostic information collected when calculating the target?
  • Do you want to use more optimization algorithms than function values?
  • Do you want to communicate between parallel processes? (Such as tasks, or minimization algorithms)

The following sections describe various ways to implement an objective function that minimizes the quadratic objective function of a single variable. In each section, we will search the bounded range from -10 to +10, which we can describe with a search space:

    space = hp.uniform('x', -10, 10)
Copy the code

Then the second section explains how to specify a more complex search space

1.1 The simplest case

The simplest protocol for hyperopt’s optimization algorithm to communicate with your objective function is that your objective function receives a valid point from the search space and returns the floating point loss (also known as disutility) associated with that point.

    from hyperopt import fmin, tpe, hp
    best = fmin(fn=lambda x: x ** 2,
        space=hp.uniform('x', -10, 10),
        algo=tpe.suggest,
        max_evals=100)
    print best
Copy the code

The advantage of this protocol is that it is readable and easy to use. As you can see, this is just one line of code. The disadvantages of this protocol are :(1) this capability does not return additional information about each assessment to the trial database; (2) Such functionality cannot interact with search algorithms or other concurrent functionality evaluations. You’ll see why you might want to do these things in the examples below.

1.2 Add additional information through trial objects

If your target function is complex and takes a long time to run, you will almost certainly want to save more statistics and diagnostic information than just the floating point loss that appears at the end. In this case, the fmin function is written to handle the dictionary return value. The idea is that your loss function can return nested dictionaries of all the statistics and diagnostics you want. But the reality is a little more flexible than this: when using mongodb, the dictionary must be a valid JSON document. However, there is a lot of flexibility in storing alternate results for specific domains.

When the target function returns a dictionary, the fMIN function looks for some special key-value pairs in the return value and passes them to the optimization algorithm. There are two key-value pairs that must be written:

  • statusMinus one of the bondshyperopt.STATUS_STRINGS“, such as “OK” on successful completion and “fail” in cases where functionality becomes undefined.
  • loss– If you are trying to minimize floating point function values, this value must exist if the state is “OK”.

The fmin function also has some optional keys

  • attachments– Dictionary of key-value pairs whose keys are short strings (such as file names) and whose values may be long strings (such as file contents) that should not be loaded from the database each time a record is accessed. (Also, MongoDB limits the length of normal key-value pairs, so once your value ismegabytes, you may have to keep it as an attachment.
  • loss_variance– Floating-point value – Random objective function uncertaintyLoss-variance).
  • ‘True_Loss’ – Floating point value – when hyperparametric optimizations are made, it is sometimes possible to get more accurate output from built-in drawing routines if this name is used to store model generalization errors.
  • true_loss_variance– Floating point value – Uncertainty of generalization error (literal translation of key)True loss-variance).

Here is the transcript:

  • attachments – a dictionary of key-value pairs whose keys are short strings (like filenames) and whose values are potentially long strings (like file contents) that should not be loaded from a database every time we access the record. (Also, MongoDB limits the length of normal key-value pairs so once your value is in the megabytes, you may have to make it an attachment.)
  • loss_variance – float – the uncertainty in a stochastic objective function
  • true_loss – float – When doing hyper-parameter optimization, if you store the generalization error of your model with this name, then you can sometimes get spiffier output from the built-in plotting routines.
  • true_loss_variance – float – the uncertainty in the generalization error

Since dictionaries are designed to accommodate various back-end storage mechanisms, you should make sure that they are JSON-compatible. As long as it’s a tree of dictionaries, lists, tuples, numbers, strings, and dates and times.

Tip: To store numpy arrays, serialize them to a string and consider storing them as attachments.

Write the above function in the dictionary return style, which looks like this:

    import pickle
    import time
    from hyperopt import fmin, tpe, hp, STATUS_OK

    def objective(x):
        return {'loss': x ** 2, 'status': STATUS_OK }

    best = fmin(objective,
        space=hp.uniform('x', -10, 10),
        algo=tpe.suggest,
        max_evals=100)

    print best
Copy the code

1.3 Trials object

To actually see the dictionary we need to return, let’s modify the target function to return more and pass an explicit Trials parameter, fmin.

    import pickle
    import time
    from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

    def objective(x):
        return {
        'loss': x ** 2,
        'status': STATUS_OK,
        # -- store other results like this
        'eval_time': time.time(),
        'other_stuff': {'type': None, 'value': [0, 1, 2]},
        # -- attachments are handled differently
        'attachments':
            {'time_module': pickle.dumps(time.time)}
        }
    trials = Trials()
    best = fmin(objective,
        space=hp.uniform('x', -10, 10),
        algo=tpe.suggest,
        max_evals=100,
        trials=trials)

    print best
Copy the code

In this case, the call to fMIN goes on as before, but by passing a Trials object directly, we can check all the return values computed during the experiment.

Here’s an example:

  • trials.trials– A list of dictionaries representing all searched content
  • trials.results– A list of dictionaries returned by the target during the search
  • trials.losses()– A List of losses (a floating value of “OK” Trial)
  • trials.statuses()– A List of status strings

The test object can be saved, passed to a built-in drawing program, or analyzed with its own custom code.

The attachments are created through a special mechanism that allows them to handle both Trials and MongoTrials using the same code.

You can retrieve the trial attachment like this, which retrieves the “time_module” attachment for trial 5:

    msg = trials.trial_attachments(trials.trials[5])['time_module']
    time_module = pickle.loads(msg)
Copy the code

The syntax is a bit complicated, mainly because attachments were designed with the idea of being large strings. So when we used MongoTrials, we didn’t want to download extra data that we didn’t need. Strings, on the other hand, connect to the Trials object Trials.attachments by setting it globally, which is like a string-to-string dictionary.

Note that Trials objects’ trial-Specific attachments are currently placed in the same Global Trials Attachment Dictionary, but that may change in the future, As a result, this is not true for the MongoTrials object.

1.4 Used for real-time andMongoDBcommunicationCtrlobject

You can fmin() to give your target function a handle to mongodb to use in parallel experiments. This mechanism makes it possible to update the database with partial results and communicate with other concurrent processes that are evaluating differences. Your target function can even add new search points, like random.suggest.

The basic techniques include:

  • usefmin_pass_expr_memo_ctrlA decorator
  • Use it in your own methodpyll.rec_eval, fromexprmemoTo construct points in the search space.
  • usectrl, a live object (trials)hyperopt.CtrlInstance.

If this doesn’t make much sense to you after this short tutorial, that’s normal, but I’d like to give some hints with the current code base on what works, and provide some terminology so that you can perform effective retrieval in HyperOpt source files, unit tests, and sample projects, such as the term HyperOpt ConvNet. If you’d like some help learning this part of the code quickly send me an email or file a Github issue.

2. Define a search space

The search space consists of nested function expressions, including random expressions. Random expressions are hyperparameters. Sampling from this nested random program defines the random search algorithm. The hyperparametric optimization algorithm works by replacing the normal “sampling” logic with an adaptive exploration strategy that does not attempt to sample from a specified distribution in the search space.

It is best to think of the search space as a random sampling procedure. For example,

from hyperopt import hp space = hp.choice('a', [ ('case 1', 1 + hp.lognormal('c1', 0, 1)), ('case 2', hp.uniform('c2', - 10, 10)))Copy the code

The result of running this code snippet is a variable space that references the diagram of the expression identifier and its parameters. There is really no sampling, just a diagram describing how to sample points. The code for processing such expression graphs is in Hyperopt. pyll, and I use these graphs as Pyll graphs or Pyll programs.

If you like, you can evaluate the sample space by sampling.

    import hyperopt.pyll.stochastic
    print hyperopt.pyll.stochastic.sample(space)
Copy the code

The search space described by space has three hyperparameters:

  • ‘A’ – Select the case
  • ‘C1’ – Positive argument used in ‘case’
  • ‘c2’ – the bounded real-valued argument used in ‘case 2’

One thing to note is that every optimizable random expression has a label as its first argument. These tags are used to select parameters and return them to the caller, internally in various ways.

The second thing to notice is that we use tuples in the middle of the diagram (around each of ‘case 1’ and ‘case 2’). Lists, dictionaries, and tuples are all upgraded to “deterministic function expressions” so that they can be part of the search space stochastic program search Space Stochastic Program.

The third thing to note is the numeric expression 1 + hp.lognormal(‘c1’, 0, 1), which is embedded in the description of the search space. In terms of optimization algorithms, there is no difference between adding 1 directly to the search space and adding 1 to the logic of the objective function itself. As a programmer, you can choose where to place this processing to achieve the kind of modularity you want. Note that the intermediate expression result in the search space can be any Python object, even when using mongodb parallel optimization. It is easy to add a new non-random expression to the search space description. See below (Section 2.3) to see how to do this.

The fourth thing to note is that ‘c1’ and ‘c2’ are examples of what we call conditional parameters. “C1” and “c2” in each returned sample have a specific value for “a”. If “a” is 0, “C1” is used instead of “c2”. If ‘a’ is 1, use ‘c2’ instead of ‘c1’. It only makes sense to encode the parameters as conditional parameters in this way, rather than simply ignoring the parameters in the objective function. Hyperot can be allocated more efficiently in search if you show the program the fact that “C1” sometimes has no effect on the objective function (because it has no effect on the parameters of the objective function).

2.1 Parameter Expression

The current optimization algorithm of Hyperopt recognizes the following random expressions:

  • hp.choice(label, options)

    • Returns one of the options, which should be a list or tuple. The options element itself can be a nested random expression. In this case, the stochastic choices, which appear only in some options, become the conditional parameter.
  • hp.randint(label, upper)

    • Returns a random integer in the range [0, upper]. The semantics of this distribution mean that the loss function between adjacent integer values has no more correlation than that between more distant integer values. For example, this describes the proper distribution of random seeds. If the loss function is likely to be more associated with adjacent integer values, then you might want to “quantize” one of the continuous distributions, e.gquniformqloguniformqnormalqlognormal
  • hp.uniform(label, low, high)

    • Returns evenly distributed values between [low,hight].
    • During optimization, this variable is restricted to a bilateral interval.
  • hp.quniform(label, low, high, q)

    • Returns a value such asUniform (low, high)/q) * Q
    • Applies to discrete values where the target is still somewhat “smooth”, but there is a boundary (bilateral interval) above and below it.
  • hp.loguniform(label, low, high)

    • Return according toExp (uniform (low, high))Draws the value so that the logarithm of the returned value is evenly distributed.

      For optimization, the variable is limited to [exp (low), exp (high)].
  • hp.qloguniform(label, low, high, q)

    • Return to a similarRound (exp (uniform (low, high))/q) * qThe value of the
    • Applies to a discrete variable that aims to be “smooth” and gets smoother with the size of the value, but has a boundary (bilateral interval) above and below it.
  • hp.normal(label, mu, sigma)

    • Returns the real value of the normal distribution, whose average value ismu, the standard deviation issigma. When optimized, this is an unconstrained variable.
  • hp.qnormal(label, mu, sigma, q)

    • Return aRound (normal (mu, sigma)/q) * QThe value of the
    • Applies to discrete values that may be required inmuNearby values, but are essentially unbounded.
  • Hp. lognormal(Label, MU, sigma)(lognormal distribution)

    • Return according toExp (normal (mu, sigma))Plots the values so that the return values are lognormal distribution. When optimized, this variable is limited to a positive value.
  • hp.qlognormal(label, mu, sigma, q)

    • Return a value similar to round (exp (normal (mu, sigma))/q) * q \

      • Applies to a discrete variable whose goal is to “smooth” and become smoother with the size of the value, which starts at a boundary (unilateral interval).

2.2 Search space example: scikit-learn

To see all the possible scenarios, let’s look at how the space of classification algorithm hyperparameters can be described in SciKit-learn. (This idea is being developed in Hyperopt-sklearn)

    from hyperopt import hp
    space = hp.choice('classifier_type', [
        {
        'type': 'naive_bayes',
        },
        {
        'type': 'svm',
        'C': hp.lognormal('svm_C', 0, 1),
        'kernel': hp.choice('svm_kernel', [
            {'ktype': 'linear'},
            {'ktype': 'RBF', 'width': hp.lognormal('svm_rbf_width', 0, 1)},
            ]),
        },
        {
        'type': 'dtree',
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
        'max_depth': hp.choice('dtree_max_depth',
            [None, hp.qlognormal('dtree_max_depth_int', 3, 1, 1)]),
        'min_samples_split': hp.qlognormal('dtree_min_samples_split', 2, 1, 1),
        },
        ])
Copy the code

2.3 Use ‘Pyll’ to add non-random expressions

You can use such nodes as arguments to pyll functions (see Pyll). If you want to know more about this,File a Github issue.

    import hyperopt.pyll
    from hyperopt.pyll import scope

    @scope.define
    def foo(a, b=0):
         print 'runing foo', a, b
         return a + b / 2

    # -- this will print 0, foo is called as usual.
    print foo(0)

    # In describing search spaces you can use `foo` as you
    # would in normal Python. These two calls will not actually call foo,
    # they just record that foo should be called to evaluate the graph.

    space1 = scope.foo(hp.uniform('a', 0, 10))
    space2 = scope.foo(hp.uniform('a', 0, 10), hp.normal('b', 0, 1)

    # -- this will print an pyll.Apply node
    print space1

    # -- this will draw a sample by running foo()
    print hyperopt.pyll.stochastic.sample(space1)
Copy the code

2.4 Add new hyperparameters

If possible, you should avoid adding new random expressions that describe the parameter search space. In order for all search algorithms to work in all Spaces, search algorithms must agree on the types of hyperparameters that describe the space. As a Library maintainer, I might throw in expressions from time to time, but like I said, I want to avoid them as much as possible. Adding new kinds of stochastic expressions is not one of the ways hyperopt is meant to Be extensible.).

Extensible.