neuraxle.data_container

Neuraxle’s DataContainer classes

Classes for containing the data that flows throught the pipeline steps.

Classes

DataContainer(data_inputs[, current_ids, …])

DataContainer class to store data inputs, expected outputs, and ids together.

ExpandedDataContainer(data_inputs, …)

Sub class of DataContainer to expand data container dimension.

ListDataContainer(data_inputs[, …])

Sub class of DataContainer to perform list operations.

ZipDataContainer(data_inputs[, current_ids, …])

Sub class of DataContainer to zip two data sources together.

class neuraxle.data_container.DataContainer(data_inputs: Any, current_ids=None, summary_id=None, expected_outputs: Any = None, sub_data_containers: List[NamedDataContainerTuple] = None)[source]

DataContainer class to store data inputs, expected outputs, and ids together. Each BaseStep needs to rehash ids with hyperparameters so that the Checkpoint step can create checkpoints for a set of hyperparameters.

The DataContainer object is passed to all of the BaseStep handle methods :

Most of the time, you won’t need to care about the DataContainer because it is the pipeline that manages it.

See also

BaseHasher,

add_sub_data_container(name: str, data_container: neuraxle.data_container.DataContainer)[source]

Get sub data container if item is str, otherwise get a zip of current ids, data inputs, and expected outputs.

Returns

self

convolved_1d(stride, kernel_size) → Iterable[neuraxle.data_container.DataContainer][source]

Returns an iterator that iterates through batches of the DataContainer.

Parameters
  • stride – step size for the convolution operation

  • kernel_size

Returns

an iterator of DataContainer

Return type

Iterable[DataContainer]

copy()[source]
get_n_batches(batch_size: int) → int[source]
get_sub_data_container_names()[source]

Get sub data container names.

Returns

list of names

hash_summary()[source]

Hash DataContainer.current_ids, data inputs, and hyperparameters together into one id.

Returns

single hashed current id for all of the current ids

Return type

str

set_current_ids(current_ids: List[str])[source]

Set current ids.

Parameters

current_ids (List[str]) – data inputs

Returns

set_data_inputs(data_inputs: Iterable[T_co])[source]

Set data inputs.

Parameters

data_inputs (Iterable) – data inputs

Returns

set_expected_outputs(expected_outputs: Iterable[T_co])[source]

Set expected outputs.

Parameters

expected_outputs (Iterable) – expected outputs

Returns

set_sub_data_containers(sub_data_containers: List[DataContainer])[source]

Set sub data containers :return:

set_summary_id(summary_id: str)[source]

Set summary id.

Parameters

summary_id – str

Returns

to_numpy()[source]
tolist()[source]
tolistshallow()[source]
class neuraxle.data_container.ExpandedDataContainer(data_inputs, current_ids, expected_outputs, summary_id, old_current_ids)[source]

Sub class of DataContainer to expand data container dimension.

See also

DataContainer,

static create_from(data_container: neuraxle.data_container.DataContainer) → neuraxle.data_container.ExpandedDataContainer[source]

Create ExpandedDataContainer with the given summary hash for the single current id.

Parameters

data_container (DataContainer) – data container to transform

Returns

expanded data container

Return type

ExpandedDataContainer

reduce_dim() → neuraxle.data_container.DataContainer[source]

Reduce DataContainer to its original shape with a list of multiple current_ids, data_inputs, and expected outputs.

Returns

reduced data container

Return type

DataContainer

class neuraxle.data_container.ListDataContainer(data_inputs: Any, current_ids=None, summary_id=None, expected_outputs: Any = None, sub_data_containers=None)[source]

Sub class of DataContainer to perform list operations. It allows to perform append, and concat operations on a DataContainer.

See also

DataContainer

append(current_id: str, data_input: Any, expected_output: Any)[source]

Append a new data input to the DataContainer.

Parameters
  • current_id (str) – current id for the data input

  • data_input – data input

  • expected_output – expected output

Returns

append_data_container(other: neuraxle.data_container.DataContainer) → neuraxle.data_container.ListDataContainer[source]

Append a data container to the DataContainer.

Parameters

other (DataContainer) – data container

Returns

append_data_container_in_data_inputs(other: neuraxle.data_container.DataContainer) → neuraxle.data_container.ListDataContainer[source]

Append a data container to the data inputs of this data container.

Parameters

other (DataContainer) – data container

Returns

concat(data_container: neuraxle.data_container.DataContainer)[source]

Concat the given data container to the current data container.

Parameters

data_container (DataContainer) – data container

Returns

static empty(original_data_container: neuraxle.data_container.DataContainer = None) → neuraxle.data_container.ListDataContainer[source]
class neuraxle.data_container.ZipDataContainer(data_inputs: Any, current_ids=None, summary_id=None, expected_outputs: Any = None, sub_data_containers: List[NamedDataContainerTuple] = None)[source]

Sub class of DataContainer to zip two data sources together.

See also

DataContainer,

concatenate_inner_features()[source]

Concatenate inner features from zipped data inputs. Broadcast data inputs if the dimension is smaller.

static create_from(data_container: neuraxle.data_container.DataContainer, *other_data_containers) → neuraxle.data_container.ZipDataContainer[source]

Create ZipDataContainer that merges two data sources together.

Parameters
  • data_container (DataContainer) – data container to transform

  • other_data_containers (List[DataContainer]) – other data containers to zip with data container

Returns

expanded data container

Return type

ExpandedDataContainer