neuraxle.metaopt.validation

Module-level documentation for neuraxle.metaopt.validation. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:

Inheritance diagram of neuraxle.metaopt.validation

Validation

Classes for hyperparameter tuning, such as random search.

Classes

AnchoredWalkForwardTimeSeriesCrossValidationSplitter(…)

An anchored walk forward cross validation works by performing a forward rolling split.

BaseValidationSplitter

GridExplorationSampler(expected_n_trials, seed_i)

This hyperparameter space optimizer is similar to a grid search, however, it does try to greedily sample maximally different points in the space to explore it.

KFoldCrossValidationSplitter(k_fold)

Create a function that splits data with K-Fold Cross-Validation resampling.

RandomSearchSampler()

AutoML Hyperparameter Optimizer that randomly samples the space of random variables.

ValidationSplitter(validation_size)

Create a function that splits data into a training, and a validation set.

WalkForwardTimeSeriesCrossValidationSplitter(…)

Perform a classic walk forward cross validation by performing a forward rolling split.


class neuraxle.metaopt.validation.RandomSearchSampler[source]

Bases: neuraxle.metaopt.data.vanilla.BaseHyperparameterOptimizer

AutoML Hyperparameter Optimizer that randomly samples the space of random variables. Please refer to AutoML for a usage example.

See also

Trainer, HyperparamsRepository,

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

find_next_best_hyperparams(round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Randomly sample the next hyperparams to try.

Return type

HyperparameterSamples

Parameters

round_scope (Round) – round scope

Returns

next random hyperparams

_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.GridExplorationSampler(expected_n_trials: int = 0, seed_i: int = 0)[source]

Bases: neuraxle.metaopt.data.vanilla.BaseHyperparameterOptimizer

This hyperparameter space optimizer is similar to a grid search, however, it does try to greedily sample maximally different points in the space to explore it. This space optimizer has a fixed pseudorandom exploration method that makes the sampling reproductible.

When over the expected_n_trials (if sampling too much), the sampler will turn to a non-seeded random search.

It may be good for space exploration before a TPE or for unit tests.

If the expected_n_trials is not set or set to 0, the sampler will guess its ideal sampling count and then switch to random search after that.

__init__(expected_n_trials: int = 0, seed_i: int = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

static estimate_ideal_n_trials(hp_space: neuraxle.hyperparams.space.HyperparameterSpace) → int[source]
_reinitialize_grid(hp_space: neuraxle.hyperparams.space.HyperparameterSpace, previous_trials_hp: List[neuraxle.hyperparams.space.HyperparameterSamples]) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Update the grid exploration sampler.

Return type

HyperparameterSamples

Parameters

round_scope – round scope

Returns

next random hyperparams

find_next_best_hyperparams(round_scope: neuraxle.metaopt.data.aggregates.Round) → neuraxle.hyperparams.space.HyperparameterSamples[source]

Sample the next hyperparams to try.

Return type

HyperparameterSamples

Parameters

round_scope (Round) – round scope

Returns

next hyperparams

_generate_grid(hp_space: neuraxle.hyperparams.space.HyperparameterSpace)[source]

Generate the grid of hyperparameters to pick from.

Parameters

hp_space (HyperparameterSpace) – hyperparameter space

static _pseudo_shuffle_list(x: List[T], seed: int = 0) → list[source]

Shuffle a list to create a pseudo-random order that is interesting.

_gen_keys_for_grid() → List[int][source]

Generate the keys for the grid.

Parameters

i – index

Returns

keys

_reshuffle_grid(new_sample: OrderedDict[str, Any] = None)[source]

Reshuffling with pseudo-random seed the hyperparameters’ values:

_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.BaseValidationSplitter[source]

Bases: abc.ABC

split_dact(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → List[Tuple[neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT], neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT]]][source]

Wrap a validation split function with a split data container function. A validation split function takes two arguments: data inputs, and expected outputs.

Parameters

data_container (DataContainer) – data container to split

Returns

a tuple of the train and validation data containers.

split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.ValidationSplitter(validation_size: float)[source]

Bases: neuraxle.metaopt.validation.BaseValidationSplitter

Create a function that splits data into a training, and a validation set.

# create a validation splitter function with 80% train, and 20% validation
validation_splitter(0.20)
Parameters

test_size – test size in float

Returns

__init__(validation_size: float)[source]

Initialize self. See help(type(self)) for accurate signature.

split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_full_validation_split(data_inputs: Optional[DIT] = None, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None) → Tuple[DIT, EOT, IDT, DIT, EOT, IDT][source]

Split data inputs, and expected outputs into a single training set, and a single validation set.

Parameters
  • test_size – test size in float

  • data_inputs – data inputs to split

  • ids – ids associated with each data entry

  • expected_outputs – expected outputs to split

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_train_split(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]

Split training set.

Parameters

data_inputs – data inputs to split

Returns

train_data_inputs

_validation_split(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]

Split validation set.

Parameters

data_inputs – data inputs to split

Returns

validation_data_inputs

_get_index_split(data_inputs: Union[IDT, DIT, EOT]) → int[source]
_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.KFoldCrossValidationSplitter(k_fold: int)[source]

Bases: neuraxle.metaopt.validation.BaseValidationSplitter

Create a function that splits data with K-Fold Cross-Validation resampling.

# create a kfold cross validation splitter with 2 kfold
kfold_cross_validation_split(0.20)
Parameters

k_fold – number of folds.

Returns

__init__(k_fold: int)[source]

Initialize self. See help(type(self)) for accurate signature.

_get_k_fold(dact_data: Union[IDT, DIT, EOT] = None) → int[source]
split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]

Train/Test split data inputs and expected outputs.

Parameters
  • data_inputs – data inputs

  • ids – id associated with each data entry (optional)

  • expected_outputs – expected outputs (optional)

  • context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_kfold_cv_split(dact_data: Union[IDT, DIT, EOT]) → Tuple[List[Union[IDT, DIT, EOT]], List[Union[IDT, DIT, EOT]]][source]

Split data with K-Fold Cross-Validation splitting.

Parameters
  • data_inputs – data inputs

  • k_fold – number of folds

Returns

a tuple of lists of folds of train_data, and of lists of validation_data, each of length “k_fold”.

_get_train_val_slices_at_fold_i(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]
_concat_fold_dact_data(arr1: Union[IDT, DIT, EOT], arr2: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]
_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.AnchoredWalkForwardTimeSeriesCrossValidationSplitter(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]

Bases: neuraxle.metaopt.validation.KFoldCrossValidationSplitter

An anchored walk forward cross validation works by performing a forward rolling split.

All training splits start at the beginning of the time series, and finish time varies.

For the validation split it, will start after a certain time delay (if padding is set) after their corresponding training split.

Data is expected to be an is a square nd.array of shape [batch_size, total_time_steps, …]. It can be N dimensions, such as 3D or more, but the time series axis is currently limited to axis=1.

__init__(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]

Create a anchored walk forward time series cross validation object.

The size of the validation split is defined by validation_window_size. The difference in start position between two consecutive validation split is also equal to validation_window_size.

Parameters
  • minimum_training_size – size of the smallest training split.

  • validation_window_size – size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value minimum_training_size.

  • padding_between_training_and_validation (int) – the size of the padding between the end of the training split and the start of the validation split, by default 0.

  • drop_remainder (bool) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.

_get_k_fold(dact_data: Union[IDT, DIT, EOT] = None) → int[source]
_get_train_val_slices_at_fold_i(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]
_get_beginning_at_fold_i(fold_i: int) → int[source]

Get the start time of the training split at the given fold index. Here in the anchored splitter, it is always zero. This method is overwritten in the non-anchored version of the walk forward ts validation splitter

_abc_impl = <_abc_data object>
class neuraxle.metaopt.validation.WalkForwardTimeSeriesCrossValidationSplitter(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]

Bases: neuraxle.metaopt.validation.AnchoredWalkForwardTimeSeriesCrossValidationSplitter

Perform a classic walk forward cross validation by performing a forward rolling split. As opposed to the AnchoredWalkForwardTimeSeriesCrossValidationSplitter, this class has a train split that is always of the same size.

All the training split have the same validation_window_size size. The start time and end time of each training split will increase identically toward the end at each forward split. Same principle apply with the validation split, where the start and end will increase in the same manner toward the end. Each validation split start after a certain time delay (if padding is set) after their corresponding training split.

Notes: The data supported by this cross validation is nd.array of shape [batch_size, total_time_steps, n_features]. The array can have an arbitrary number of dimension, but the time series axis is currently limited to axis=1.

__init__(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]

Create a classic walk forward time series cross validation object.

The difference in start position between two consecutive validation split are equal to one validation_window_size.

Parameters
  • training_window_size – the window size of training split.

  • validation_window_size – the window size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value training_window_size.

  • padding_between_training_and_validation (int) – the size of the padding between the end of the training split and the start of the validation split, by default 0.

  • drop_remainder (bool) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.

_get_beginning_at_fold_i(fold_i: int) → int[source]

Get the start time of the training split at the given fold index.

_abc_impl = <_abc_data object>