neuraxle.metaopt.validation¶

Module-level documentation for neuraxle.metaopt.validation. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:

Validation¶

Classes for hyperparameter tuning, such as random search.

Classes

`AnchoredWalkForwardTimeSeriesCrossValidationSplitter`(…)	An anchored walk forward cross validation works by performing a forward rolling split.
`BaseValidationSplitter`
`KFoldCrossValidationSplitter`(k_fold)	Create a function that splits data with K-Fold Cross-Validation resampling.
`ValidationSplitter`(validation_size)	Create a function that splits data into a training, and a validation set.
`WalkForwardTimeSeriesCrossValidationSplitter`(…)	Perform a classic walk forward cross validation by performing a forward rolling split.

class neuraxle.metaopt.validation.BaseValidationSplitter[source]¶

Bases: abc.ABC

split_dact(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → List[Tuple[neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT], neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT]]][source]¶

Wrap a validation split function with a split data container function. A validation split function takes two arguments: data inputs, and expected outputs.

Parameters: data_container (DataContainer) – data container to split
Returns: a tuple of the train and validation data containers.

split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶

Train/Test split data inputs and expected outputs.

Parameters

data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.validation.ValidationSplitter(validation_size: float)[source]¶

Bases: neuraxle.metaopt.validation.BaseValidationSplitter

Create a function that splits data into a training, and a validation set.

# create a validation splitter function with 80% train, and 20% validation
validation_splitter(0.20)

Parameters: test_size – test size in float
Returns

__init__(validation_size: float)[source]¶: Initialize self. See help(type(self)) for accurate signature.

split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶

Train/Test split data inputs and expected outputs.

Parameters

data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_full_validation_split(data_inputs: Optional[DIT] = None, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None) → Tuple[DIT, EOT, IDT, DIT, EOT, IDT][source]¶

Split data inputs, and expected outputs into a single training set, and a single validation set.

Parameters

test_size – test size in float
data_inputs – data inputs to split
ids – ids associated with each data entry
expected_outputs – expected outputs to split

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_train_split(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶

Split training set.

Parameters: data_inputs – data inputs to split
Returns: train_data_inputs

_validation_split(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶

Split validation set.

Parameters: data_inputs – data inputs to split
Returns: validation_data_inputs

_get_index_split(data_inputs: Union[IDT, DIT, EOT]) → int[source]¶

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.validation.KFoldCrossValidationSplitter(k_fold: int)[source]¶

Bases: neuraxle.metaopt.validation.BaseValidationSplitter

Create a function that splits data with K-Fold Cross-Validation resampling.

# create a kfold cross validation splitter with 2 kfold
kfold_cross_validation_split(0.20)

Parameters: k_fold – number of folds.
Returns

__init__(k_fold: int)[source]¶: Initialize self. See help(type(self)) for accurate signature.

_get_k_fold(dact_data: Union[IDT, DIT, EOT] = None) → int[source]¶

split(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶

Train/Test split data inputs and expected outputs.

Parameters

data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)

Returns

train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids

_kfold_cv_split(dact_data: Union[IDT, DIT, EOT]) → Tuple[List[Union[IDT, DIT, EOT]], List[Union[IDT, DIT, EOT]]][source]¶

Split data with K-Fold Cross-Validation splitting.

Parameters

data_inputs – data inputs
k_fold – number of folds

Returns

a tuple of lists of folds of train_data, and of lists of validation_data, each of length “k_fold”.

_get_train_val_slices_at_fold_i(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]¶

_concat_fold_dact_data(arr1: Union[IDT, DIT, EOT], arr2: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.validation.AnchoredWalkForwardTimeSeriesCrossValidationSplitter(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶

Bases: neuraxle.metaopt.validation.KFoldCrossValidationSplitter

An anchored walk forward cross validation works by performing a forward rolling split.

All training splits start at the beginning of the time series, and finish time varies.

For the validation split it, will start after a certain time delay (if padding is set) after their corresponding training split.

Data is expected to be an is a square nd.array of shape [batch_size, total_time_steps, …]. It can be N dimensions, such as 3D or more, but the time series axis is currently limited to axis=1.

__init__(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶

Create a anchored walk forward time series cross validation object.

The size of the validation split is defined by validation_window_size. The difference in start position between two consecutive validation split is also equal to validation_window_size.

Parameters

minimum_training_size – size of the smallest training split.
validation_window_size – size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value minimum_training_size.
padding_between_training_and_validation (int) – the size of the padding between the end of the training split and the start of the validation split, by default 0.
drop_remainder (bool) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.

_get_k_fold(dact_data: Union[IDT, DIT, EOT] = None) → int[source]¶

_get_train_val_slices_at_fold_i(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]¶

_get_beginning_at_fold_i(fold_i: int) → int[source]¶: Get the start time of the training split at the given fold index. Here in the anchored splitter, it is always zero. This method is overwritten in the non-anchored version of the walk forward ts validation splitter

_abc_impl = <_abc_data object>¶

class neuraxle.metaopt.validation.WalkForwardTimeSeriesCrossValidationSplitter(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶

Bases: neuraxle.metaopt.validation.AnchoredWalkForwardTimeSeriesCrossValidationSplitter

Perform a classic walk forward cross validation by performing a forward rolling split. As opposed to the AnchoredWalkForwardTimeSeriesCrossValidationSplitter, this class has a train split that is always of the same size.

All the training split have the same validation_window_size size. The start time and end time of each training split will increase identically toward the end at each forward split. Same principle apply with the validation split, where the start and end will increase in the same manner toward the end. Each validation split start after a certain time delay (if padding is set) after their corresponding training split.

Notes: The data supported by this cross validation is nd.array of shape [batch_size, total_time_steps, n_features]. The array can have an arbitrary number of dimension, but the time series axis is currently limited to axis=1.

__init__(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶

Create a classic walk forward time series cross validation object.

The difference in start position between two consecutive validation split are equal to one validation_window_size.

Parameters

training_window_size – the window size of training split.
validation_window_size – the window size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value training_window_size.
padding_between_training_and_validation (int) – the size of the padding between the end of the training split and the start of the validation split, by default 0.
drop_remainder (bool) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.

_get_beginning_at_fold_i(fold_i: int) → int[source]¶: Get the start time of the training split at the given fold index.

_abc_impl = <_abc_data object>¶