neuraxle.metaopt.auto_ml

Neuraxle’s AutoML Classes

Classes used to build any Automatic Machine Learning strategies.

Functions

kfold_cross_validation_split(data_inputs, k_fold)

validation_split(test_size, data_inputs[, …])

Split data inputs, and expected outputs into a training set, and a validation set.

Classes

AutoML(pipeline, validation_splitter, …[, …])

A step to execute any Automatic Machine Learning Algorithms.

AutoMLContainer(trials, …)

Data object for auto ml.

BaseHyperparameterSelectionStrategy

BaseValidationSplitter

HyperparamsJSONRepository(…[, …])

Hyperparams repository that saves json files for every AutoML trial.

HyperparamsRepository([…])

Hyperparams repository that saves hyperparams, and scores for every AutoML trial.

InMemoryHyperparamsRepository([…])

In memory hyperparams repository that can print information about trials.

KFoldCrossValidationSplitter(k_fold)

Create a function that splits data with K-Fold Cross-Validation resampling.

RandomSearchHyperparameterSelectionStrategy()

AutoML Hyperparameter Optimizer that randomly samples the space of random variables.

Trainer(epochs, scoring_callback, …)

Example usage :

ValidationSplitter(test_size)

Create a function that splits data into a training, and a validation set.

class neuraxle.metaopt.auto_ml.AutoML(pipeline: neuraxle.base.BaseStep, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, refit_trial: bool, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, hyperparams_optimizer: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None, n_trials: int = 10, epochs: int = 1, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, refit_scoring_function: Callable = None, print_func: Callable = None, cache_folder_when_no_handle=None)[source]

A step to execute any Automatic Machine Learning Algorithms.

Example usage :

auto_ml = AutoML(
    pipeline,
    n_trials=n_iter,
    validation_split_function=validation_splitter(0.2),
    hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    callbacks=[
        MetricCallback('mse', metric_function=mean_squared_error, higher_score_is_better=False)
    ],
    refit_trial=True,
    cache_folder_when_no_handle=str(tmpdir)
)

auto_ml = auto_ml.fit(data_inputs, expected_outputs)
_abc_impl = <_abc_data object>
_fit_data_container(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Run Auto ML Loop. Find the best hyperparams using the hyperparameter optmizer. Evaluate the pipeline on each trial using a validation technique.

Parameters
  • data_container – data container to fit

  • context – execution context

Returns

self

_load_virgin_best_model() → neuraxle.base.BaseStep[source]

Get the best model from all of the previous trials.

Returns

best model step

_load_virgin_model(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Load virigin model with the given hyperparams.

Returns

best model step

_save_trial(repo_trial, trial_number)[source]
get_best_model()[source]

Get best model using the hyperparams repository.

Returns

class neuraxle.metaopt.auto_ml.AutoMLContainer(trials: neuraxle.metaopt.trial.Trials, hyperparameter_space: neuraxle.hyperparams.space.HyperparameterSpace, trial_number: int, main_scoring_metric_name: str)[source]

Data object for auto ml.

class neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy[source]
_abc_impl = <_abc_data object>
find_next_best_hyperparams(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Find the next best hyperparams using previous trials.

Parameters

auto_ml_container – trials data container

Returns

next best hyperparams

class neuraxle.metaopt.auto_ml.BaseValidationSplitter[source]
_abc_impl = <_abc_data object>
split(data_inputs, expected_outputs=None) → Tuple[List[T], List[T], List[T], List[T]][source]
split_data_container(data_container: neuraxle.data_container.DataContainer) → List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]][source]

Wrap a validation split function with a split data container function. A validation split function takes two arguments: data inputs, and expected outputs.

Parameters

data_container – data container to split

Returns

a function that returns the pairs of training, and validation data containers for each validation split.

class neuraxle.metaopt.auto_ml.HyperparamsJSONRepository(hyperparameter_selection_strategy: Optional[neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy] = None, cache_folder=None, best_retrained_model_folder=None)[source]

Hyperparams repository that saves json files for every AutoML trial.

Example usage :

HyperparamsJSONRepository(
    hyperparameter_selection_strategy=RandomSearchHyperparameterSelectionStrategy(),
    cache_folder='cache',
    best_retrained_model_folder='best'
)
_abc_impl = <_abc_data object>
_create_trial_json(trial: neuraxle.metaopt.trial.Trial)[source]

Save new trial json file.

Returns

(hyperparams, scores)

_get_failed_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial)[source]

Get the json path for the given failed trial.

Parameters

trial

Returns

_get_new_trial_json_path(current_hyperparameters_hash)[source]

Get new trial json path.

Parameters

current_hyperparameters_hash

Returns

_get_successful_trial_json_file_path(trial: neuraxle.metaopt.trial.Trial) → str[source]

Get the json path for the given successful trial.

Parameters

trial – trial

Returns

str

_remove_new_trial_json(current_hyperparameters_hash)[source]

Remove trial file associated with the given hyperparameters hash.

Parameters

current_hyperparameters_hash

Returns

_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial json.

Parameters

trial – trial to save

Returns

load_all_trials(status: Optional[neuraxle.metaopt.trial.TRIAL_STATUS] = None) → neuraxle.metaopt.trial.Trials[source]

Load all hyperparameter trials with their corresponding score. Reads all the saved trial json files, sorted by creation date.

Returns

(hyperparams, scores)

new_trial(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer)[source]

Create new hyperperams trial json file.

Parameters

auto_ml_container – auto ml container

Returns

subscribe_to_cache_folder_changes(refresh_interval_in_seconds: int, observer: neuraxle.metaopt.observable._Observer[typing.Tuple[neuraxle.metaopt.auto_ml.HyperparamsRepository, neuraxle.metaopt.trial.Trial]][Tuple[neuraxle.metaopt.auto_ml.HyperparamsRepository, neuraxle.metaopt.trial.Trial]])[source]

Every refresh_interval_in_seconds

Parameters
  • refresh_interval_in_seconds – number of seconds to wait before sending updates to the observers

  • observer

Returns

class neuraxle.metaopt.auto_ml.HyperparamsRepository(hyperparameter_selection_strategy=None, cache_folder=None, best_retrained_model_folder=None)[source]

Hyperparams repository that saves hyperparams, and scores for every AutoML trial.

_abc_impl = <_abc_data object>
_get_trial_hash(hp_dict)[source]

Hash hyperparams with md5 to create a trial hash.

Parameters

hp_dict

Returns

_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

save trial.

Parameters

trial – trial to save.

Returns

get_best_hyperparams() → neuraxle.hyperparams.space.HyperparameterSamples[source]

Get best hyperparams from all of the saved trials.

Returns

best hyperparams.

get_best_model()[source]

Load the best model saved inside the best retrained model folder.

Returns

load_all_trials(status: neuraxle.metaopt.trial.TRIAL_STATUS) → neuraxle.metaopt.trial.Trials[source]

Load all hyperparameter trials with their corresponding score. Sorted by creation date.

Returns

Trials (hyperparams, scores)

new_trial(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer)[source]

Save hyperparams, and score for a failed trial.

Returns

(hyperparams, scores)

save_best_model(step: neuraxle.base.BaseStep)[source]

Save the best model inside the best retrained model folder.

Parameters

step – step to save

Returns

saved step

save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial, and notify trial observers.

Parameters

trial – trial to save.

Returns

set_strategy(hyperparameter_selection_strategy: neuraxle.metaopt.auto_ml.BaseHyperparameterSelectionStrategy)[source]

Set hyperparameter selection strategy.

Parameters

hyperparameter_selection_strategy – hyperparameter selection strategy.

Returns

class neuraxle.metaopt.auto_ml.InMemoryHyperparamsRepository(hyperparameter_selection_strategy=None, print_func: Callable = None, cache_folder: str = None, best_retrained_model_folder=None)[source]

In memory hyperparams repository that can print information about trials. Useful for debugging.

Example usage :

InMemoryHyperparamsRepository(
    hyperparameter_selection_strategy=RandomSearchHyperparameterSelectionStrategy(),
    print_func=print,
    cache_folder='cache',
    best_retrained_model_folder='best'
)
_abc_impl = <_abc_data object>
_save_trial(trial: neuraxle.metaopt.trial.Trial)[source]

Save trial.

Parameters

trial – trial to save

Returns

load_all_trials(status: Optional[neuraxle.metaopt.trial.TRIAL_STATUS] = None) → neuraxle.metaopt.trial.Trials[source]

Load all trials with the given status.

Parameters

status – trial status

Returns

list of trials

new_trial(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.metaopt.trial.Trial[source]

Create a new trial with the best next hyperparams.

Parameters

auto_ml_container – auto ml data container

Returns

trial

class neuraxle.metaopt.auto_ml.KFoldCrossValidationSplitter(k_fold: int)[source]

Create a function that splits data with K-Fold Cross-Validation resampling.

# create a kfold cross validation splitter with 2 kfold
kfold_cross_validation_split(0.20)
Parameters

k_fold – number of folds.

Returns

_abc_impl = <_abc_data object>
split(data_inputs, expected_outputs=None) → Tuple[List[T], List[T], List[T], List[T]][source]
class neuraxle.metaopt.auto_ml.RandomSearchHyperparameterSelectionStrategy[source]

AutoML Hyperparameter Optimizer that randomly samples the space of random variables. Please refer to AutoML for a usage example.

_abc_impl = <_abc_data object>
find_next_best_hyperparams(auto_ml_container: neuraxle.metaopt.auto_ml.AutoMLContainer) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Randomly sample the next hyperparams to try.

Parameters

auto_ml_container – trials data container

Returns

next best hyperparams

class neuraxle.metaopt.auto_ml.Trainer(epochs: int, scoring_callback: neuraxle.metaopt.callbacks.ScoringCallback, validation_splitter: neuraxle.metaopt.auto_ml.BaseValidationSplitter, callbacks: List[neuraxle.metaopt.callbacks.BaseCallback] = None, print_func: Callable = None, hyperparams_repository: neuraxle.metaopt.auto_ml.HyperparamsRepository = None)[source]

Example usage :

trainer = Trainer(
    epochs=10,
    callbacks=[EarlyStoppingCallback()],
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    validation_splitter=ValidationSplitter(test_size=0.15),
    print_func=print
)

repo_trial = trainer.train(
    pipeline=pipeline,
    data_inputs=data_inputs,
    expected_outputs=expected_outputs
)
execute_trial(pipeline: neuraxle.base.BaseStep, trial_number: int, repo_trial: neuraxle.metaopt.trial.Trial, context: neuraxle.base.ExecutionContext, validation_splits: List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]], n_trial: int, delete_pipeline_on_completion: bool = True)[source]

Train pipeline using the validation splitter. Track training, and validation metrics for each epoch.

Parameters
  • pipeline – pipeline to train on

  • trial_number – trial number

  • repo_trial – repo trial

  • validation_splits – validation splits

  • context – execution context

  • n_trial – total number of trials that will be executed

  • delete_pipeline_on_completion – bool to delete pipeline on completion or not

Returns

executed trial split

fit_trial_split(trial_split: neuraxle.metaopt.trial.TrialSplit, train_data_container: neuraxle.data_container.DataContainer, validation_data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.metaopt.trial.TrialSplit[source]

Train pipeline using the training data container. Track training, and validation metrics for each epoch.

Parameters
  • train_data_container – train data container

  • validation_data_container – validation data container

  • trial_split – trial to execute

  • context – execution context

Returns

executed trial

get_main_metric_name() → str[source]

Get main metric name.

Returns

refit(p: neuraxle.base.BaseStep, data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Refit the pipeline on the whole dataset (without any validation technique).

Parameters
  • p – trial to refit

  • data_container – data container

  • context – execution context

Returns

fitted pipeline

train(pipeline: neuraxle.base.BaseStep, data_inputs, expected_outputs=None) → neuraxle.metaopt.trial.Trial[source]

Train pipeline using the validation splitter. Track training, and validation metrics for each epoch. Note: the present method is just a shortcut to using the execute_trial method with less boilerplate code needed.

Refer to execute_trial for full flexibility

param pipeline

pipeline to train on

param data_inputs

data inputs

param expected_outputs

expected ouptuts to fit on

return

executed trial

class neuraxle.metaopt.auto_ml.ValidationSplitter(test_size: float)[source]

Create a function that splits data into a training, and a validation set.

# create a validation splitter function with 80% train, and 20% validation
validation_splitter(0.20)
Parameters

test_size – test size in float

Returns

_abc_impl = <_abc_data object>
split(data_inputs, expected_outputs=None) → Tuple[List[T], List[T], List[T], List[T]][source]
neuraxle.metaopt.auto_ml._get_index_split(data_inputs, test_size)[source]
neuraxle.metaopt.auto_ml._get_trial_split_description(repo_trial: neuraxle.metaopt.trial.Trial, repo_trial_split: neuraxle.metaopt.trial.TrialSplit, validation_splits: List[Tuple[neuraxle.data_container.DataContainer, neuraxle.data_container.DataContainer]], trial_number: int, n_trial: int)[source]
neuraxle.metaopt.auto_ml._train_split(data_inputs, test_size) → List[T][source]

Split training set.

Parameters

data_inputs – data inputs to split

Returns

train_data_inputs

neuraxle.metaopt.auto_ml._validation_split(data_inputs, test_size) → List[T][source]

Split validation set.

Parameters

data_inputs – data inputs to split

Returns

validation_data_inputs

neuraxle.metaopt.auto_ml.kfold_cross_validation_split(data_inputs, k_fold)[source]
neuraxle.metaopt.auto_ml.validation_split(test_size: float, data_inputs, expected_outputs=None) → Tuple[List[T], List[T], List[T], List[T]][source]

Split data inputs, and expected outputs into a training set, and a validation set.

Parameters
  • test_size – test size in float

  • data_inputs – data inputs to split

  • expected_outputs – expected outputs to split

Returns

train_data_inputs, train_expected_outputs, validation_data_inputs, validation_expected_outputs