neuraxle.metaopt.hyperopt.tpe

Module-level documentation for neuraxle.metaopt.hyperopt.tpe. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:

Inheritance diagram of neuraxle.metaopt.hyperopt.tpe

Tree parzen estimator

Code for tree parzen estimator auto ml.

Classes

TreeParzenEstimator(…)

This is a Tree Parzen Estimator (TPE) algorithm as found in Hyperopt, that is better than the Random Search algorithm and supports supporting intelligent exploration v.s.


class neuraxle.metaopt.hyperopt.tpe.TreeParzenEstimator(number_of_initial_random_step: int = 15, quantile_threshold: float = 0.3, number_good_trials_max_cap: int = 25, number_possible_hyperparams_candidates: int = 100, use_linear_forgetting_weights: bool = False, number_recent_trials_at_full_weights: int = 25)[source]

Bases: neuraxle.metaopt.data.vanilla.BaseHyperparameterOptimizer

This is a Tree Parzen Estimator (TPE) algorithm as found in Hyperopt, that is better than the Random Search algorithm and supports supporting intelligent exploration v.s. exploitation of the search space over time, using Neuraxle hyperparameters.

Here, the algorithm is modified compared to the original one, as it uses a neuraxle.metaopt.automl.GridExplorationSampler instead of a random search to pick the first exploration samples to furthermore explore the space at the beginning.

__init__(number_of_initial_random_step: int = 15, quantile_threshold: float = 0.3, number_good_trials_max_cap: int = 25, number_possible_hyperparams_candidates: int = 100, use_linear_forgetting_weights: bool = False, number_recent_trials_at_full_weights: int = 25)[source]

Initialize the TPE with some configuration.

Parameters
  • number_of_initial_random_step (int) – Number of random steps to take before starting the optimization.

  • quantile_threshold (float) – threshold between 0 and 1 representing the proportion of good trials to keep.

  • number_good_trials_max_cap (int) – maximum number of good trials to keep, that will cap the quantile_threshold.

  • number_possible_hyperparams_candidates (int) – number of possible hyperparams candidates to explore in the good / bad trials posterior.

  • use_linear_forgetting_weights (bool) – if True, the weights will be linearly decreasing past the number_recent_trial_at_full_weights.

  • number_recent_trials_at_full_weights (int) – number of recent trials to use at full weights before linear forgetting when use_linear_forgetting_weights is set to True.

find_next_best_hyperparams(round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Find the next best hyperparams using previous trials.

Return type

HyperparameterSamples

Parameters

round_scope (Round) – round scope

Returns

next best hyperparams

_sample_next_hyperparams_from_gaussians_div(hp_keys: List[str], divided_good_and_bad_distrs: List[_DividedTPEPosteriors]) → neuraxle.hyperparams.space.HyperparameterSamples[source]
_sample_one_hp_from_gaussians_div(hp_key: List[str], good_bad_posterior_div: neuraxle.metaopt.hyperopt.tpe._DividedTPEPosteriors) → Any[source]
_abc_impl = <_abc_data object>
class neuraxle.metaopt.hyperopt.tpe._DividedMixturesFactory(quantile_threshold: float, number_good_trials_max_cap: int, use_linear_forgetting_weights: bool, number_recent_trials_at_full_weights: int)[source]

Bases: object

__init__(quantile_threshold: float, number_good_trials_max_cap: int, use_linear_forgetting_weights: bool, number_recent_trials_at_full_weights: int)[source]

Initialize self. See help(type(self)) for accurate signature.

create_from(round_scope: neuraxle.metaopt.data.aggregates.Round) → Tuple[List[str], List[neuraxle.metaopt.hyperopt.tpe._DividedTPEPosteriors]][source]
_split_good_and_bad_trials(round_scope: neuraxle.metaopt.data.aggregates.Round) → Tuple[List[neuraxle.metaopt.data.aggregates.Trial], List[neuraxle.metaopt.data.aggregates.Trial]][source]
_create_posterior(flat_hp_space_tuples: List[Tuple[str, neuraxle.hyperparams.distributions.HyperparameterDistribution]], trials: List[neuraxle.metaopt.data.aggregates.Trial]) → List[neuraxle.hyperparams.distributions.HyperparameterDistribution][source]
_reweights_categorical(discrete_distribution: neuraxle.hyperparams.distributions.DiscreteHyperparameterDistribution, trial_hyperparameters: List[Any]) → Union[neuraxle.hyperparams.distributions.Choice, neuraxle.hyperparams.distributions.PriorityChoice][source]
_create_gaussian_mixture(continuous_distribution: neuraxle.hyperparams.distributions.HyperparameterDistribution, trial_hyperparameters: List[Any]) → neuraxle.hyperparams.distributions.DistributionMixture[source]
_adaptive_parzen_normal(hyperparam_distribution: neuraxle.hyperparams.distributions.HyperparameterDistribution, distribution_trials: List[Any])[source]
_compute_distributions_means_stds(hyperparam_distribution: neuraxle.hyperparams.distributions.HyperparameterDistribution, distribution_trials: List[Any])[source]
_generate_linear_forget_weights(number_samples: int) → numpy.ndarray[source]
class neuraxle.metaopt.hyperopt.tpe._DividedTPEPosteriors(good_trials: neuraxle.hyperparams.distributions.HyperparameterDistribution, bad_trials: neuraxle.hyperparams.distributions.HyperparameterDistribution)[source]

Bases: object

Sample possible new hyperparams in the good_trials.

Verify if we use the ratio directly or we use the loglikelihood of b_post under both distribution like hyperopt. In hyperopt, they use the following to calculate the log likelihood of b_post under both distributions:

Verify ratio good pdf versus bad pdf for all possible new hyperparams.

The gist of it is that this is as dividing good probabilities by bad probabilities.

We used what is described in the article which is the ratio (gamma + g(x) / l(x) ( 1- gamma))^-1 that we have to maximize.

Since there is ^-1, we have to maximize l(x) / g(x).

Only the best ratio is kept and is the new best hyperparams.

They seem to take log of pdf and not pdf directly so it’s probable to have - instead of / as an implementation. Regardless, both operators are transitive in nature for max comparisons in their regular or log space.

# TODO: Maybe they use the likelyhood to sum over all possible parameters to find the max so it become a join distribution of all hyperparameters, would make sense.

# TODO: verify is for quantized we do not want to do cdf(value higher) - cdf(value lower) to have pdf.

__init__(good_trials: neuraxle.hyperparams.distributions.HyperparameterDistribution, bad_trials: neuraxle.hyperparams.distributions.HyperparameterDistribution)[source]

Initialize self. See help(type(self)) for accurate signature.

rvs_good_with_pdf_division_proba() → Tuple[Any, float][source]

Sample an hyperparameter from the good distribution and return the probability ratio.

Returns

sampled good hyperparameter and probability ratio

rvs_good() → Any[source]

Sample an hyperparameter from the good distribution.

Returns

sampled good hyperparameter

proba_ratio(possible_new_hyperparm: Any) → float[source]

Return the probability ratio of the sampled good hyperparam.

Returns

probability ratio