neuraxle.metaopt.auto_ml

Module-level documentation for neuraxle.metaopt.auto_ml. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:


Neuraxle’s AutoML Classes

Classes used to build any Automatic Machine Learning pipelines. Hyperparameter selection strategies are used to optimize the hyperparameters of given pipelines.

Classes

AutoML(pipeline, validation_splitter, …[, …])

This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general.

BaseControllerLoop(trainer, n_trials, …)

ControlledAutoML(pipeline, loop, …)

A step to execute Automated Machine Learning (AutoML) algorithms.

DefaultLoop(trainer, n_trials, hp_optimizer, …)

Trainer(validation_splitter, callbacks, n_epochs)

Class used to train a pipeline using various data splits and callbacks for evaluation purposes.

Examples using neuraxle.metaopt.auto_ml.AutoML


class neuraxle.metaopt.auto_ml.Trainer(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]

Bases: neuraxle.base.BaseService

Class used to train a pipeline using various data splits and callbacks for evaluation purposes.

__init__(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]

Initialize self. See help(type(self)) for accurate signature.

train(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial, return_trained_pipelines: bool = False) → Optional[List[neuraxle.base.BaseStep]][source]

Train pipeline using the validation splitter. Track training, and validation metrics for each epoch. Note: the present method is just a shortcut to using the execute_trial method with less boilerplate code needed. Refer to execute_trial for full flexibility

train_split(pipeline: neuraxle.base.BaseStep, train_dact: neuraxle.data_container.DataContainer, val_dact: Optional[neuraxle.data_container.DataContainer], trial_split_scope: neuraxle.metaopt.data.aggregates.TrialSplit) → neuraxle.base.BaseStep[source]

Train a pipeline split. You probably want to use self.train instead, to use the validation splitter. If validation DACT is None, the evaluation metrics will not save validation results.

refit(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial) → neuraxle.base.BaseStep[source]

Refit the pipeline on the whole dataset (without any validation technique).

Returns

fitted pipeline

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.BaseControllerLoop(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]

Bases: neuraxle.base.TruncableService

__init__(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]

Initialize self. See help(type(self)) for accurate signature.

trainer
run(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round)[source]

Run the controller loop.

Parameters

context – execution context

Returns

the ID of the round that was executed (either created or continued from previous optimization).

loop(round_scope: neuraxle.metaopt.data.aggregates.Round) → Iterator[neuraxle.metaopt.data.aggregates.Trial][source]

Loop over all trials.

Parameters
  • dact – data container that is not yet splitted

  • context – execution context

Returns

next_trial(round_scope: neuraxle.metaopt.data.aggregates.Round) → AbstractContextManager[neuraxle.metaopt.data.aggregates.Trial][source]

Get the next trial to be executed.

Parameters

round_scope (Round) – round scope

Returns

the next trial to be executed.

refit_best_trial(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.base.BaseStep[source]

Refit the pipeline on the whole dataset (without any validation technique).

for_refit_only() → neuraxle.metaopt.auto_ml.BaseControllerLoop[source]

Create a controller loop configured with zero iterations so as to only make the “refit_best_trial” possible.

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.DefaultLoop(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]

Bases: neuraxle.metaopt.auto_ml.BaseControllerLoop

__init__(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]

Initialize self. See help(type(self)) for accurate signature.

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.ControlledAutoML(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]

Bases: neuraxle.base.ForceHandleMixin, neuraxle.base._HasChildrenMixin, neuraxle.base.BaseStep

A step to execute Automated Machine Learning (AutoML) algorithms. This step will automatically split the data into train and validation splits, and execute an hyperparameter optimization on the splits to find the best hyperparameters.

The Controller Loop is useful to possibly split the execution into multiple threads, or even multiple machines.

The Trainer is responsible for training the pipeline on the train and validation splits as splitted.

The step with the chosen good hyperparameters will be refitted to the full unsplitted data if desired.

__init__(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]

Note

Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.

Parameters
  • pipeline – The pipeline, or BaseStep, which will be use by the AutoMLloop

  • loop (BaseControllerLoop) – The loop, or BaseControllerLoop, which will be used by the AutoML loop

  • flow – The flow, or Flow, which will be used by the AutoML loop

  • refit_best_trial (bool) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.

get_children() → List[neuraxle.base.BaseStep][source]

Get the list of all the childs for that step or service.

Returns

every child steps

wrapped
_fit_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → Tuple[neuraxle.base.BaseStep, neuraxle.data_container.DataContainer][source]

Fit transform data container.

Parameters
Returns

(fitted self, data container)

to_force_refit_best_trial() → neuraxle.metaopt.auto_ml.ControlledAutoML[source]
_fit_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Run Auto ML Loop. Find the best hyperparams using the hyperparameter optmizer. Evaluate the pipeline on each trial using a validation technique.

Parameters
Returns

self

get_automl_context(context: neuraxle.base.ExecutionContext, with_loc=True) → neuraxle.metaopt.context.AutoMLContext[source]
round_number
report
_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Transform data container.

Return type

DataContainer

Parameters
Returns

data container

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.AutoML(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]

Bases: neuraxle.metaopt.auto_ml.ControlledAutoML

This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general.

Parameters
  • pipeline – pipeline to copy and use for training

  • validation_splitter – validation splitter to use

  • refit_best_trial – whether to refit the best model on the whole dataset after the optimization

  • scoring_callback – main callback to use for scoring, that is deprecated

  • hyperparams_optimizer – hyperparams optimizer to use

  • hyperparams_repository – hyperparams repository to use

  • n_trials – number of trials to run

  • epochs – number of epochs to train the model for each val split

  • callbacks – callbacks to use for training - there can be aditionnal metrics there

  • n_jobs – number of jobs to use for parallelization, defaults is None for no parallelization

  • continue_loop_on_error – whether to continue the main optimization loop on error or not

Returns

AutoML object ready to use with fit and transform.

__init__(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]

Note

Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.

Parameters
  • pipeline (BaseStep) – The pipeline, or BaseStep, which will be use by the AutoMLloop

  • loop – The loop, or BaseControllerLoop, which will be used by the AutoML loop

  • flow – The flow, or Flow, which will be used by the AutoML loop

  • refit_best_trial (bool) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.

_abc_impl = <_abc_data object>