neuraxle.metaopt.auto_ml¶

Module-level documentation for neuraxle.metaopt.auto_ml. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:

Neuraxle’s AutoML Classes¶

Classes used to build any Automatic Machine Learning pipelines. Hyperparameter selection strategies are used to optimize the hyperparameters of given pipelines.

Classes

`AutoML`(pipeline, validation_splitter, …[, …])	This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general.
`BaseControllerLoop`(trainer, n_trials, …)
`ControlledAutoML`(pipeline, loop, …)	A step to execute Automated Machine Learning (AutoML) algorithms.
`DefaultLoop`(trainer, n_trials, hp_optimizer, …)
`Trainer`(validation_splitter, callbacks, n_epochs)	Class used to train a pipeline using various data splits and callbacks for evaluation purposes.

Examples using `neuraxle.metaopt.auto_ml.AutoML`¶

class neuraxle.metaopt.auto_ml.Trainer(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]¶

Bases: neuraxle.base.BaseService

Class used to train a pipeline using various data splits and callbacks for evaluation purposes.

__init__(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

train(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial, return_trained_pipelines: bool = False) → Optional[List[neuraxle.base.BaseStep]][source]¶: Train pipeline using the validation splitter. Track training, and validation metrics for each epoch. Note: the present method is just a shortcut to using the execute_trial method with less boilerplate code needed. Refer to execute_trial for full flexibility

train_split(pipeline: neuraxle.base.BaseStep, train_dact: neuraxle.data_container.DataContainer, val_dact: Optional[neuraxle.data_container.DataContainer], trial_split_scope: neuraxle.metaopt.data.aggregates.TrialSplit) → neuraxle.base.BaseStep[source]¶: Train a pipeline split. You probably want to use self.train instead, to use the validation splitter. If validation DACT is None, the evaluation metrics will not save validation results.

refit(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial) → neuraxle.base.BaseStep[source]¶

Refit the pipeline on the whole dataset (without any validation technique).

Returns: fitted pipeline

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.auto_ml.BaseControllerLoop(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]¶

Bases: neuraxle.base.TruncableService

__init__(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

trainer¶

run(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round)[source]¶

Run the controller loop.

Parameters: context – execution context
Returns: the ID of the round that was executed (either created or continued from previous optimization).

loop(round_scope: neuraxle.metaopt.data.aggregates.Round) → Iterator[neuraxle.metaopt.data.aggregates.Trial][source]¶

Loop over all trials.

Parameters

dact – data container that is not yet splitted
context – execution context

Returns

next_trial(round_scope: neuraxle.metaopt.data.aggregates.Round) → AbstractContextManager[neuraxle.metaopt.data.aggregates.Trial][source]¶

Get the next trial to be executed.

Parameters: round_scope (Round) – round scope
Returns: the next trial to be executed.

refit_best_trial(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.base.BaseStep[source]¶: Refit the pipeline on the whole dataset (without any validation technique).

for_refit_only() → neuraxle.metaopt.auto_ml.BaseControllerLoop[source]¶: Create a controller loop configured with zero iterations so as to only make the “refit_best_trial” possible.

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.auto_ml.DefaultLoop(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]¶

Bases: neuraxle.metaopt.auto_ml.BaseControllerLoop

__init__(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.auto_ml.ControlledAutoML(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]¶

Bases: neuraxle.base.ForceHandleMixin, neuraxle.base._HasChildrenMixin, neuraxle.base.BaseStep

A step to execute Automated Machine Learning (AutoML) algorithms. This step will automatically split the data into train and validation splits, and execute an hyperparameter optimization on the splits to find the best hyperparameters.

The Controller Loop is useful to possibly split the execution into multiple threads, or even multiple machines.

The Trainer is responsible for training the pipeline on the train and validation splits as splitted.

The step with the chosen good hyperparameters will be refitted to the full unsplitted data if desired.

__init__(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]¶

Note

Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.

Parameters

pipeline – The pipeline, or BaseStep, which will be use by the AutoMLloop
loop (BaseControllerLoop) – The loop, or BaseControllerLoop, which will be used by the AutoML loop
flow – The flow, or Flow, which will be used by the AutoML loop
refit_best_trial (bool) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.

get_children() → List[neuraxle.base.BaseStep][source]¶

Get the list of all the childs for that step or service.

Returns: every child steps

wrapped¶

_fit_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → Tuple[neuraxle.base.BaseStep, neuraxle.data_container.DataContainer][source]¶

Fit transform data container.

Parameters

data_container (DataContainer) – data container
context (ExecutionContext) – execution context

Returns

(fitted self, data container)

to_force_refit_best_trial() → neuraxle.metaopt.auto_ml.ControlledAutoML[source]¶

_fit_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]¶

Run Auto ML Loop. Find the best hyperparams using the hyperparameter optmizer. Evaluate the pipeline on each trial using a validation technique.

Parameters

data_container (DataContainer) – data container to fit
context (ExecutionContext) – execution context

Returns

self

get_automl_context(context: neuraxle.base.ExecutionContext, with_loc=True) → neuraxle.metaopt.context.AutoMLContext[source]¶

round_number¶

report¶

_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]¶

Transform data container.

Return type

DataContainer

Parameters

data_container (DataContainer) – data container
context (ExecutionContext) – execution context

Returns

data container

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.auto_ml.AutoML(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]¶

Bases: neuraxle.metaopt.auto_ml.ControlledAutoML

This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general.

Parameters

pipeline – pipeline to copy and use for training
validation_splitter – validation splitter to use
refit_best_trial – whether to refit the best model on the whole dataset after the optimization
scoring_callback – main callback to use for scoring, that is deprecated
hyperparams_optimizer – hyperparams optimizer to use
hyperparams_repository – hyperparams repository to use
n_trials – number of trials to run
epochs – number of epochs to train the model for each val split
callbacks – callbacks to use for training - there can be aditionnal metrics there
n_jobs – number of jobs to use for parallelization, defaults is None for no parallelization
continue_loop_on_error – whether to continue the main optimization loop on error or not

Returns

AutoML object ready to use with fit and transform.

__init__(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]¶

Note

Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.

Parameters

pipeline (BaseStep) – The pipeline, or BaseStep, which will be use by the AutoMLloop
loop – The loop, or BaseControllerLoop, which will be used by the AutoML loop
flow – The flow, or Flow, which will be used by the AutoML loop
refit_best_trial (bool) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.

_abc_impl = <_abc_data object>¶

neuraxle.metaopt.auto_ml¶

Neuraxle’s AutoML Classes¶

Examples using neuraxle.metaopt.auto_ml.AutoML¶

Examples using `neuraxle.metaopt.auto_ml.AutoML`¶