neuraxle.metaopt.auto_ml

Module-level documentation for neuraxle.metaopt.auto_ml. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:

Inheritance diagram of neuraxle.metaopt.auto_ml

Neuraxle’s AutoML Classes

Classes used to build any Automatic Machine Learning strategies.

Functions

kfold_cross_validation_split(data_inputs, k_fold)

validation_split(test_size, data_inputs[, …])

Split data inputs, and expected outputs into a training set, and a validation set.

Classes

AutoML(pipeline, validation_splitter, …[, …])

A step to execute any Automatic Machine Learning Algorithms.

AutoMLContainer(trials, …)

Data object for auto ml.

BaseHyperparameterSelectionStrategy

BaseValidationSplitter

HyperparamsJSONRepository(…[, …])

Hyperparams repository that saves json files for every AutoML trial.

HyperparamsRepository(…)

Hyperparams repository that saves hyperparams, and scores for every AutoML trial.

InMemoryHyperparamsRepository([…])

In memory hyperparams repository that can print information about trials.

KFoldCrossValidationSplitter(k_fold)

Create a function that splits data with K-Fold Cross-Validation resampling.

RandomSearchHyperparameterSelectionStrategy()

AutoML Hyperparameter Optimizer that randomly samples the space of random variables.

Trainer(epochs, scoring_callback, …)

Example usage :

ValidationSplitter(test_size)

Create a function that splits data into a training, and a validation set.

Examples using neuraxle.metaopt.auto_ml.AutoML

Examples using neuraxle.metaopt.auto_ml.HyperparamsJSONRepository

Examples using neuraxle.metaopt.auto_ml.InMemoryHyperparamsRepository

Examples using neuraxle.metaopt.auto_ml.RandomSearchHyperparameterSelectionStrategy

Examples using neuraxle.metaopt.auto_ml.ValidationSplitter


class neuraxle.metaopt.auto_ml.HyperparamsRepository(hyperparameter_selection_strategy: Optional[neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy] = None, cache_folder: str = None, best_retrained_model_folder: str = None)[source]

Bases: neuraxle.metaopt.observable._Observable, abc.ABC

Hyperparams repository that saves hyperparams, and scores for every AutoML trial. Cache folder can be changed to do different round numbers.

__init__(hyperparameter_selection_strategy: Optional[neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy] = None, cache_folder: str = None, best_retrained_model_folder: str = None)[source]

Initialize self. See help(type(self)) for accurate signature.

set_strategy(hyperparameter_selection_strategy: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy)[source]

Set hyperparameter selection strategy.

Parameters

hyperparameter_selection_strategy – hyperparameter selection strategy.

Returns

load_all_trials(status: neuraxle.metaopt.trial.TRIAL_STATUS) → neuraxle.metaopt.trial.Trials[source]

Load all hyperparameter trials with their corresponding score. Sorted by creation date.

Returns

Trials (hyperparams, scores)

save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial, and notify trial observers.

Parameters

trial – trial to save.

Returns

_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

save trial.

Parameters

trial – trial to save.

Returns

get_best_hyperparams() → neuraxle.hyperparams.space.HyperparameterSamples[source]

Get best hyperparams from all of the saved trials.

Returns

best hyperparams.

_save_best_model(step: neuraxle.base.BaseStep, trial_hash: str)[source]
_load_best_model(trial_hash: str) → neuraxle.base.BaseStep[source]
get_best_model()[source]

Load the best model saved inside the best retrained model folder.

Returns

save_best_model(step: neuraxle.base.BaseStep)[source]

Save the best model inside the best retrained model folder.

Parameters

step (BaseStep) – step to save

Returns

saved step

new_trial(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.metaopt.trial.Trial[source]

Create a new trial with the best next hyperparams.

Return type

Trial

Parameters
  • context

  • auto_ml_container – auto ml data container

Returns

trial

_get_trial_hash(hp_dict)[source]

Hash hyperparams with md5 to create a trial hash.

Parameters

hp_dict

Returns

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.InMemoryHyperparamsRepository(hyperparameter_selection_strategy=None, cache_folder: str = None, best_retrained_model_folder=None)[source]

Bases: neuraxle.metaopt.auto_ml.HyperparamsRepository

In memory hyperparams repository that can print information about trials. Useful for debugging.

Example usage :

InMemoryHyperparamsRepository(
    hyperparameter_selection_strategy=RandomSearchHyperparameterSelectionStrategy(),
    print_func=print,
    cache_folder='cache',
    best_retrained_model_folder='best'
)
__init__(hyperparameter_selection_strategy=None, cache_folder: str = None, best_retrained_model_folder=None)[source]

Initialize self. See help(type(self)) for accurate signature.

load_all_trials(status: Optional[neuraxle.metaopt.trial.TRIAL_STATUS] = None) → neuraxle.metaopt.trial.Trials[source]

Load all trials with the given status.

Parameters

status – trial status

Returns

list of trials

_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial.

Parameters

trial – trial to save

Returns

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.HyperparamsJSONRepository(hyperparameter_selection_strategy: Optional[neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy] = None, cache_folder=None, best_retrained_model_folder=None)[source]

Bases: neuraxle.metaopt.auto_ml.HyperparamsRepository

Hyperparams repository that saves json files for every AutoML trial.

Example usage :

HyperparamsJSONRepository(
    hyperparameter_selection_strategy=RandomSearchHyperparameterSelectionStrategy(),
    cache_folder='cache',
    best_retrained_model_folder='best'
)
__init__(hyperparameter_selection_strategy: Optional[neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy] = None, cache_folder=None, best_retrained_model_folder=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial json.

Parameters

trial – trial to save

Returns

new_trial(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.metaopt.trial.Trial[source]

Create new hyperperams trial json file.

Return type

Trial

Parameters

auto_ml_container – auto ml container

Returns

load_all_trials(status: Optional[neuraxle.metaopt.trial.TRIAL_STATUS] = None) → neuraxle.metaopt.trial.Trials[source]

Load all hyperparameter trials with their corresponding score. Reads all the saved trial json files, sorted by creation date.

Parameters

status – (optional) filter to select only trials with this status.

Returns

(hyperparams, scores)

_get_successful_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial) → str[source]

Get the json path for the given successful trial.

Return type

str

Parameters

trial – trial

Returns

str

_get_failed_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial)[source]

Get the json path for the given failed trial.

Parameters

trial – trial

Returns

str

_get_ongoing_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial)[source]

Get ongoing trial json path.

_get_new_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial)[source]

Get new trial json path.

_remove_previous_trial_state_json()[source]
subscribe_to_cache_folder_changes(refresh_interval_in_seconds: int, observer: neuraxle.metaopt.observable._Observer[typing.Tuple[neuraxle.metaopt.auto_ml.HyperparamsRepository, neuraxle.metaopt.trial.Trial]][Tuple[neuraxle.metaopt.auto_ml.HyperparamsRepository, neuraxle.metaopt.trial.Trial]])[source]

Every refresh_interval_in_seconds

Parameters
  • refresh_interval_in_seconds (int) – number of seconds to wait before sending updates to the observers

  • observer

Returns

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy[source]

Bases: abc.ABC

find_next_best_hyperparams(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Find the next best hyperparams using previous trials.

Return type

HyperparameterSamples

Parameters

auto_ml_container – trials data container

Returns

next best hyperparams

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.Trainer(epochs: int, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None)[source]

Bases: object

Example usage :

trainer = Trainer(
    epochs=10,
    callbacks=[EarlyStoppingCallback()],
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    validation_splitter=ValidationSplitter(test_size=0.15),
    print_func=print
)

repo_trial = trainer.train(
    pipeline=pipeline,
    data_inputs=data_inputs,
    expected_outputs=expected_outputs
)
__init__(epochs: int, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None)[source]

Initialize self. See help(type(self)) for accurate signature.

train(pipeline: neuraxle.base.BaseStep, data_inputs, expected_outputs=None, context: neuraxle.base.ExecutionContext = None, trial_number=0) → neuraxle.metaopt.trial.Trial[source]

Train pipeline using the validation splitter. Track training, and validation metrics for each epoch. Note: the present method is just a shortcut to using the execute_trial method with less boilerplate code needed. Refer to execute_trial for full flexibility

Return type

Trial

Parameters
  • pipeline (BaseStep) – pipeline to train on

  • data_inputs – data inputs

  • expected_outputs – expected ouptuts to fit on

Returns

executed trial

execute_trial(pipeline: neuraxle.base.BaseStep, repo_trial: neuraxle.metaopt.trial.Trial, context: neuraxle.base.ExecutionContext, validation_splits: List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]], n_trial: int, delete_pipeline_on_completion: bool = True)[source]

Train pipeline using the validation splitter. Track training, and validation metrics for each epoch.

Parameters
  • pipeline (BaseStep) – pipeline to train on

  • trial_number – trial number

  • repo_trial (Trial) – repo trial

  • validation_splits – validation splits

  • context (ExecutionContext) – execution context

  • n_trial (int) – total number of trials that will be executed

  • delete_pipeline_on_completion (bool) – bool to delete pipeline on completion or not

Returns

executed trial split

fit_trial_split(trial_split: neuraxle.metaopt.trial.TrialSplit, train_data_container: neuraxle.data_container.DataContainer, validation_data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.metaopt.trial.TrialSplit[source]

Train pipeline using the training data container. Track training, and validation metrics for each epoch.

Return type

TrialSplit

Parameters
Returns

executed trial

refit(p: neuraxle.base.BaseStep, data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Refit the pipeline on the whole dataset (without any validation technique).

Return type

BaseStep

Parameters
Returns

fitted pipeline

get_main_metric_name() → str[source]

Get main metric name.

Returns

class neuraxle.metaopt.auto_ml.AutoML(pipeline: neuraxle.base.BaseStep, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, refit_trial: bool, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, hyperparams_optimizer: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None, n_trials: int = 10, epochs: int = 1, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, refit_scoring_function: Callable = None, cache_folder_when_no_handle=None, n_jobs=None, continue_loop_on_error=True)[source]

Bases: neuraxle.base.ForceHandleMixin, neuraxle.base._HasChildrenMixin, neuraxle.base.BaseStep

A step to execute any Automatic Machine Learning Algorithms.

Example usage :

auto_ml = AutoML(
    pipeline,
    n_trials=n_iter,
    validation_split_function=validation_splitter(0.2),
    hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    callbacks=[
        MetricCallback('mse', metric_function=mean_squared_error, higher_score_is_better=False)
    ],
    refit_trial=True,
    cache_folder_when_no_handle=str(tmpdir)
)

auto_ml = auto_ml.fit(data_inputs, expected_outputs)
__init__(pipeline: neuraxle.base.BaseStep, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, refit_trial: bool, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, hyperparams_optimizer: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None, n_trials: int = 10, epochs: int = 1, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, refit_scoring_function: Callable = None, cache_folder_when_no_handle=None, n_jobs=None, continue_loop_on_error=True)[source]
Notes on multiprocess :

Usage of a multiprocess-safe hyperparams repository is recommended, although it is, most of the time, not necessary. Beware of the behaviour of HyperparamsRepository’s observers/subscribers. Context instances are not shared between trial but copied. So is the AutoML loop and the DataContainers.

Parameters
  • pipeline (BaseStep) – The pipeline, or BaseStep, which will be use by the AutoMLloop

  • validation_splitter – A BaseValidationSplitter instance to split data between training and validation set.

  • refit_trial (bool) – A boolean indicating whether to perform, after , a fit call with

  • scoring_callback (ScoringCallback) – The scoring callback to use during training

  • hyperparams_optimizer (BaseHyperparameterSelectionStrategy) – a BaseHyperparameterSelectionStrategy instance that can be queried for new sets of hyperparameters.

  • hyperparams_repository (HyperparamsRepository) – a HyperparamsRepository instance to store experiement status and results.

  • n_trials (int) – The number of different hyperparameters to try.

  • epochs (int) – The number of epoch to perform for each trial.

  • callbacks – A list of callbacks to perform after each epoch.

  • refit_scoring_function – A scoring function to use on a refit call

  • cache_folder_when_no_handle – default cache folder used if auto_ml_loop isn’t called through handler functions.

  • n_jobs – If n_jobs in (None, 1), then automl is executed in a single process, which may spawns on multiple thread. if n_jobs > 1, then n_jobs process are launched, if n_jobs <= -1 then (n_cpus + 1 + n_jobs) process are launched. One trial at a time is executed by process.

  • continue_loop_on_error (bool) –

_fit_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Run Auto ML Loop. Find the best hyperparams using the hyperparameter optmizer. Evaluate the pipeline on each trial using a validation technique.

Parameters
Returns

self

_attempt_trial(trial_number, validation_splits, context: neuraxle.base.ExecutionContext)[source]
_fit_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) -> ('BaseStep', <class 'neuraxle.data_container.DataContainer'>)[source]

Fit transform data container.

Parameters
Returns

(fitted self, data container)

_transform_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Transform data container.

Return type

DataContainer

Parameters
Returns

data container

get_best_model()[source]

Get best model using the hyperparams repository.

Returns

_load_virgin_best_model() → neuraxle.base.BaseStep[source]

Get the best model from all of the previous trials.

Returns

best model step

_load_virgin_model(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Load virigin model with the given hyperparams.

Returns

best model step

get_children() → List[neuraxle.base.BaseStep][source]

Get the list of all the childs for that step.

Returns

_abc_impl = <_abc_data object>
neuraxle.metaopt.auto_ml._get_trial_split_description(repo_trial: neuraxle.metaopt.trial.Trial, repo_trial_split_number: int, validation_splits: List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]], trial_number: int, n_trial: int)[source]
class neuraxle.metaopt.auto_ml.AutoMLContainer(trials: neuraxle.metaopt.trial.Trials, hyperparameter_space: neuraxle.hyperparams.space.HyperparameterSpace, trial_number: int, main_scoring_metric_name: str)[source]

Bases: object

Data object for auto ml.

__init__(trials: neuraxle.metaopt.trial.Trials, hyperparameter_space: neuraxle.hyperparams.space.HyperparameterSpace, trial_number: int, main_scoring_metric_name: str)[source]

Initialize self. See help(type(self)) for accurate signature.

class neuraxle.metaopt.auto_ml.RandomSearchHyperparameterSelectionStrategy[source]

Bases: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy

AutoML Hyperparameter Optimizer that randomly samples the space of random variables. Please refer to AutoML for a usage example.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

find_next_best_hyperparams(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Randomly sample the next hyperparams to try.

Return type

HyperparameterSamples

Parameters

auto_ml_container – trials data container

Returns

next best hyperparams

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.BaseValidationSplitter[source]

Bases: abc.ABC

split_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]][source]

Wrap a validation split function with a split data container function. A validation split function takes two arguments: data inputs, and expected outputs.

Parameters

data_container (DataContainer) – data container to split

Returns

a function that returns the pairs of training, and validation data containers for each validation split.

split(data_inputs, current_ids=None, expected_outputs=None, context: neuraxle.base.ExecutionContext = None) → Tuple[List[T], List[T], List[T], List[T], List[T], List[T]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • current_ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context (ExecutionContext) – execution context (optional)

Returns

train_data_inputs, train_expected_outputs, train_current_ids, validation_data_inputs, validation_expected_outputs, validation_current_ids

_abc_impl = <_abc_data object>
class neuraxle.metaopt.auto_ml.KFoldCrossValidationSplitter(k_fold: int)[source]

Bases: neuraxle.metaopt.auto_ml.BaseValidationSplitter

Create a function that splits data with K-Fold Cross-Validation resampling.

# create a kfold cross validation splitter with 2 kfold
kfold_cross_validation_split(0.20)
Parameters

k_fold – number of folds.

Returns

__init__(k_fold: int)[source]

Initialize self. See help(type(self)) for accurate signature.

split(data_inputs, current_ids=None, expected_outputs=None, context: neuraxle.base.ExecutionContext = None) → Tuple[List[T], List[T], List[T], List[T], List[T], List[T]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • current_ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context (ExecutionContext) – execution context (optional)

Returns

train_data_inputs, train_expected_outputs, train_current_ids, validation_data_inputs, validation_expected_outputs, validation_current_ids

_abc_impl = <_abc_data object>
neuraxle.metaopt.auto_ml.kfold_cross_validation_split(data_inputs, k_fold)[source]
class neuraxle.metaopt.auto_ml.ValidationSplitter(test_size: float)[source]

Bases: neuraxle.metaopt.auto_ml.BaseValidationSplitter

Create a function that splits data into a training, and a validation set.

# create a validation splitter function with 80% train, and 20% validation
validation_splitter(0.20)
Parameters

test_size – test size in float

Returns

__init__(test_size: float)[source]

Initialize self. See help(type(self)) for accurate signature.

split(data_inputs, current_ids=None, expected_outputs=None, context: neuraxle.base.ExecutionContext = None) → Tuple[List[T], List[T], List[T], List[T]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • current_ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context (ExecutionContext) – execution context (optional)

Returns

train_data_inputs, train_expected_outputs, train_current_ids, validation_data_inputs, validation_expected_outputs, validation_current_ids

_abc_impl = <_abc_data object>
neuraxle.metaopt.auto_ml.validation_split(test_size: float, data_inputs, current_ids=None, expected_outputs=None) → Tuple[List[T], List[T], List[T], List[T], List[T], List[T]][source]

Split data inputs, and expected outputs into a training set, and a validation set.

Parameters
  • test_size (float) – test size in float

  • data_inputs – data inputs to split

  • current_ids – ids associated with each data entry

  • expected_outputs – expected outputs to split

Returns

train_data_inputs, train_expected_outputs, current_ids_train, validation_data_inputs, validation_expected_outputs, current_ids_val

neuraxle.metaopt.auto_ml._train_split(data_inputs, test_size) → List[T][source]

Split training set.

Parameters

data_inputs – data inputs to split

Returns

train_data_inputs

neuraxle.metaopt.auto_ml._validation_split(data_inputs, test_size) → List[T][source]

Split validation set.

Parameters

data_inputs – data inputs to split

Returns

validation_data_inputs

neuraxle.metaopt.auto_ml._get_index_split(data_inputs, test_size)[source]