neuraxle.metaopt.auto_ml¶
Module-level documentation for neuraxle.metaopt.auto_ml. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:
Neuraxle’s AutoML Classes¶
Classes used to build any Automatic Machine Learning pipelines. Hyperparameter selection strategies are used to optimize the hyperparameters of given pipelines.
Classes
|
This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general. |
|
|
|
A step to execute Automated Machine Learning (AutoML) algorithms. |
|
|
|
Class used to train a pipeline using various data splits and callbacks for evaluation purposes. |
Examples using neuraxle.metaopt.auto_ml.AutoML
¶
-
class
neuraxle.metaopt.auto_ml.
Trainer
(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]¶ Bases:
neuraxle.base.BaseService
Class used to train a pipeline using various data splits and callbacks for evaluation purposes.
-
__init__
(validation_splitter: neuraxle.metaopt.validation.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, n_epochs: int = 1)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
train
(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial, return_trained_pipelines: bool = False) → Optional[List[neuraxle.base.BaseStep]][source]¶ Train pipeline using the validation splitter. Track training, and validation metrics for each epoch. Note: the present method is just a shortcut to using the execute_trial method with less boilerplate code needed. Refer to execute_trial for full flexibility
-
train_split
(pipeline: neuraxle.base.BaseStep, train_dact: neuraxle.data_container.DataContainer, val_dact: Optional[neuraxle.data_container.DataContainer], trial_split_scope: neuraxle.metaopt.data.aggregates.TrialSplit) → neuraxle.base.BaseStep[source]¶ Train a pipeline split. You probably want to use self.train instead, to use the validation splitter. If validation DACT is None, the evaluation metrics will not save validation results.
-
refit
(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, trial_scope: neuraxle.metaopt.data.aggregates.Trial) → neuraxle.base.BaseStep[source]¶ Refit the pipeline on the whole dataset (without any validation technique).
- Returns
fitted pipeline
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.auto_ml.
BaseControllerLoop
(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]¶ Bases:
neuraxle.base.TruncableService
-
__init__
(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = True)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
trainer
¶
-
run
(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round)[source]¶ Run the controller loop.
- Parameters
context – execution context
- Returns
the ID of the round that was executed (either created or continued from previous optimization).
-
loop
(round_scope: neuraxle.metaopt.data.aggregates.Round) → Iterator[neuraxle.metaopt.data.aggregates.Trial][source]¶ Loop over all trials.
- Parameters
dact – data container that is not yet splitted
context – execution context
- Returns
-
next_trial
(round_scope: neuraxle.metaopt.data.aggregates.Round) → AbstractContextManager[neuraxle.metaopt.data.aggregates.Trial][source]¶ Get the next trial to be executed.
- Parameters
round_scope (
Round
) – round scope- Returns
the next trial to be executed.
-
refit_best_trial
(pipeline: neuraxle.base.BaseStep, dact: neuraxle.data_container.DataContainer, round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.base.BaseStep[source]¶ Refit the pipeline on the whole dataset (without any validation technique).
-
for_refit_only
() → neuraxle.metaopt.auto_ml.BaseControllerLoop[source]¶ Create a controller loop configured with zero iterations so as to only make the “refit_best_trial” possible.
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.auto_ml.
DefaultLoop
(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]¶ Bases:
neuraxle.metaopt.auto_ml.BaseControllerLoop
-
__init__
(trainer: neuraxle.metaopt.auto_ml.Trainer, n_trials: int, hp_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, continue_loop_on_error: bool = False, n_jobs: int = 1)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.auto_ml.
ControlledAutoML
(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]¶ Bases:
neuraxle.base.ForceHandleMixin
,neuraxle.base._HasChildrenMixin
,neuraxle.base.BaseStep
A step to execute Automated Machine Learning (AutoML) algorithms. This step will automatically split the data into train and validation splits, and execute an hyperparameter optimization on the splits to find the best hyperparameters.
The Controller Loop is useful to possibly split the execution into multiple threads, or even multiple machines.
The Trainer is responsible for training the pipeline on the train and validation splits as splitted.
The step with the chosen good hyperparameters will be refitted to the full unsplitted data if desired.
-
__init__
(pipeline: BaseStepT, loop: neuraxle.metaopt.auto_ml.BaseControllerLoop, main_metric_name: str, repo: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, start_new_round: bool = True, refit_best_trial: bool = True, project_name: str = 'default_project', client_name: str = 'default_client')[source]¶ Note
Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.
- Parameters
pipeline – The pipeline, or BaseStep, which will be use by the AutoMLloop
loop (
BaseControllerLoop
) – The loop, or BaseControllerLoop, which will be used by the AutoML loopflow – The flow, or Flow, which will be used by the AutoML loop
refit_best_trial (
bool
) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.
-
get_children
() → List[neuraxle.base.BaseStep][source]¶ Get the list of all the childs for that step or service.
- Returns
every child steps
-
wrapped
¶
-
_fit_transform_data_container
(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → Tuple[neuraxle.base.BaseStep, neuraxle.data_container.DataContainer][source]¶ Fit transform data container.
- Parameters
data_container (
DataContainer
) – data containercontext (
ExecutionContext
) – execution context
- Returns
(fitted self, data container)
-
_fit_data_container
(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]¶ Run Auto ML Loop. Find the best hyperparams using the hyperparameter optmizer. Evaluate the pipeline on each trial using a validation technique.
- Parameters
data_container (
DataContainer
) – data container to fitcontext (
ExecutionContext
) – execution context
- Returns
self
-
get_automl_context
(context: neuraxle.base.ExecutionContext, with_loc=True) → neuraxle.metaopt.context.AutoMLContext[source]¶
-
round_number
¶
-
report
¶
-
_transform_data_container
(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]¶ Transform data container.
- Return type
- Parameters
data_container (
DataContainer
) – data containercontext (
ExecutionContext
) – execution context
- Returns
data container
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.auto_ml.
AutoML
(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]¶ Bases:
neuraxle.metaopt.auto_ml.ControlledAutoML
This class provides a nice interface to easily use the ControlledAutoML class and the metaopt module in general.
- Parameters
pipeline – pipeline to copy and use for training
validation_splitter – validation splitter to use
refit_best_trial – whether to refit the best model on the whole dataset after the optimization
scoring_callback – main callback to use for scoring, that is deprecated
hyperparams_optimizer – hyperparams optimizer to use
hyperparams_repository – hyperparams repository to use
n_trials – number of trials to run
epochs – number of epochs to train the model for each val split
callbacks – callbacks to use for training - there can be aditionnal metrics there
n_jobs – number of jobs to use for parallelization, defaults is None for no parallelization
continue_loop_on_error – whether to continue the main optimization loop on error or not
- Returns
AutoML object ready to use with fit and transform.
-
__init__
(pipeline: neuraxle.base.BaseStep, validation_splitter: Optional[neuraxle.metaopt.validation.BaseValidationSplitter] = None, hyperparams_optimizer: neuraxle.metaopt.optimizer.BaseHyperparameterOptimizer = None, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback = None, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.repositories.repo.HyperparamsRepository = None, n_trials: int = None, refit_best_trial: bool = True, start_new_round=True, epochs: int = 1, n_jobs=1, continue_loop_on_error=True)[source]¶ Note
Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Context instances are not shared between trial but copied. So is the AutoML loop and the DACTs.
- Parameters
pipeline (
BaseStep
) – The pipeline, or BaseStep, which will be use by the AutoMLlooploop – The loop, or BaseControllerLoop, which will be used by the AutoML loop
flow – The flow, or Flow, which will be used by the AutoML loop
refit_best_trial (
bool
) – A boolean indicating whether to perform, after a fit call, a refit on the best trial.
-
_abc_impl
= <_abc_data object>¶