neuraxle.metaopt.validation¶
Module-level documentation for neuraxle.metaopt.validation. Here is an inheritance diagram, including dependencies to other base modules of Neuraxle:
Validation¶
Classes for hyperparameter tuning, such as random search.
Classes
An anchored walk forward cross validation works by performing a forward rolling split. |
|
|
Create a function that splits data with K-Fold Cross-Validation resampling. |
|
Create a function that splits data into a training, and a validation set. |
Perform a classic walk forward cross validation by performing a forward rolling split. |
-
class
neuraxle.metaopt.validation.
BaseValidationSplitter
[source]¶ Bases:
abc.ABC
-
split_dact
(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → List[Tuple[neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT], neuraxle.data_container.DataContainer[~IDT, ~DIT, ~EOT][IDT, DIT, EOT]]][source]¶ Wrap a validation split function with a split data container function. A validation split function takes two arguments: data inputs, and expected outputs.
- Parameters
data_container (
DataContainer
) – data container to split- Returns
a tuple of the train and validation data containers.
-
split
(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶ Train/Test split data inputs and expected outputs.
- Parameters
data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)
- Returns
train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.validation.
ValidationSplitter
(validation_size: float)[source]¶ Bases:
neuraxle.metaopt.validation.BaseValidationSplitter
Create a function that splits data into a training, and a validation set.
# create a validation splitter function with 80% train, and 20% validation validation_splitter(0.20)
- Parameters
test_size – test size in float
- Returns
-
__init__
(validation_size: float)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
split
(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶ Train/Test split data inputs and expected outputs.
- Parameters
data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)
- Returns
train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids
-
_full_validation_split
(data_inputs: Optional[DIT] = None, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None) → Tuple[DIT, EOT, IDT, DIT, EOT, IDT][source]¶ Split data inputs, and expected outputs into a single training set, and a single validation set.
- Parameters
test_size – test size in float
data_inputs – data inputs to split
ids – ids associated with each data entry
expected_outputs – expected outputs to split
- Returns
train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids
-
_train_split
(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶ Split training set.
- Parameters
data_inputs – data inputs to split
- Returns
train_data_inputs
-
_validation_split
(data_inputs: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶ Split validation set.
- Parameters
data_inputs – data inputs to split
- Returns
validation_data_inputs
-
_abc_impl
= <_abc_data object>¶
-
class
neuraxle.metaopt.validation.
KFoldCrossValidationSplitter
(k_fold: int)[source]¶ Bases:
neuraxle.metaopt.validation.BaseValidationSplitter
Create a function that splits data with K-Fold Cross-Validation resampling.
# create a kfold cross validation splitter with 2 kfold kfold_cross_validation_split(0.20)
- Parameters
k_fold – number of folds.
- Returns
-
split
(data_inputs: DIT, ids: Optional[IDT] = None, expected_outputs: Optional[EOT] = None, context: Optional[neuraxle.base.ExecutionContext] = None) → Tuple[List[DIT], List[EOT], List[IDT], List[DIT], List[EOT], List[IDT]][source]¶ Train/Test split data inputs and expected outputs.
- Parameters
data_inputs – data inputs
ids – id associated with each data entry (optional)
expected_outputs – expected outputs (optional)
context – execution context (optional)
- Returns
train_di, train_eo, train_ids, valid_di, valid_eo, valid_ids
-
_kfold_cv_split
(dact_data: Union[IDT, DIT, EOT]) → Tuple[List[Union[IDT, DIT, EOT]], List[Union[IDT, DIT, EOT]]][source]¶ Split data with K-Fold Cross-Validation splitting.
- Parameters
data_inputs – data inputs
k_fold – number of folds
- Returns
a tuple of lists of folds of train_data, and of lists of validation_data, each of length “k_fold”.
-
_get_train_val_slices_at_fold_i
(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]¶
-
_concat_fold_dact_data
(arr1: Union[IDT, DIT, EOT], arr2: Union[IDT, DIT, EOT]) → Union[IDT, DIT, EOT][source]¶
-
_abc_impl
= <_abc_data object>¶
-
class
neuraxle.metaopt.validation.
AnchoredWalkForwardTimeSeriesCrossValidationSplitter
(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶ Bases:
neuraxle.metaopt.validation.KFoldCrossValidationSplitter
An anchored walk forward cross validation works by performing a forward rolling split.
All training splits start at the beginning of the time series, and finish time varies.
For the validation split it, will start after a certain time delay (if padding is set) after their corresponding training split.
Data is expected to be an is a square nd.array of shape [batch_size, total_time_steps, …]. It can be N dimensions, such as 3D or more, but the time series axis is currently limited to axis=1.
-
__init__
(minimum_training_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶ Create a anchored walk forward time series cross validation object.
The size of the validation split is defined by validation_window_size. The difference in start position between two consecutive validation split is also equal to validation_window_size.
- Parameters
minimum_training_size – size of the smallest training split.
validation_window_size – size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value minimum_training_size.
padding_between_training_and_validation (
int
) – the size of the padding between the end of the training split and the start of the validation split, by default 0.drop_remainder (
bool
) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.
-
_get_train_val_slices_at_fold_i
(dact_data: Union[IDT, DIT, EOT], fold_i: int) → Tuple[Union[IDT, DIT, EOT], Union[IDT, DIT, EOT]][source]¶
-
_get_beginning_at_fold_i
(fold_i: int) → int[source]¶ Get the start time of the training split at the given fold index. Here in the anchored splitter, it is always zero. This method is overwritten in the non-anchored version of the walk forward ts validation splitter
-
_abc_impl
= <_abc_data object>¶
-
-
class
neuraxle.metaopt.validation.
WalkForwardTimeSeriesCrossValidationSplitter
(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶ Bases:
neuraxle.metaopt.validation.AnchoredWalkForwardTimeSeriesCrossValidationSplitter
Perform a classic walk forward cross validation by performing a forward rolling split. As opposed to the AnchoredWalkForwardTimeSeriesCrossValidationSplitter, this class has a train split that is always of the same size.
All the training split have the same validation_window_size size. The start time and end time of each training split will increase identically toward the end at each forward split. Same principle apply with the validation split, where the start and end will increase in the same manner toward the end. Each validation split start after a certain time delay (if padding is set) after their corresponding training split.
Notes: The data supported by this cross validation is nd.array of shape [batch_size, total_time_steps, n_features]. The array can have an arbitrary number of dimension, but the time series axis is currently limited to axis=1.
-
__init__
(training_window_size, validation_window_size=None, padding_between_training_and_validation=0, drop_remainder=False)[source]¶ Create a classic walk forward time series cross validation object.
The difference in start position between two consecutive validation split are equal to one validation_window_size.
- Parameters
training_window_size – the window size of training split.
validation_window_size – the window size of each validation split and also the time step taken between each forward roll, by default None. If None : It takes the value training_window_size.
padding_between_training_and_validation (
int
) – the size of the padding between the end of the training split and the start of the validation split, by default 0.drop_remainder (
bool
) – drop the last split if the last validation split does not coincide with a full validation_window_size, by default False.
-
_get_beginning_at_fold_i
(fold_i: int) → int[source]¶ Get the start time of the training split at the given fold index.
-
_abc_impl
= <_abc_data object>¶
-