neuraxle.checkpoints

Neuraxle’s Checkpoint Classes

The checkpoint classes used by the checkpoint pipeline runner

Classes

BaseCheckpointer(execution_mode)

Base class to implement a step checkpoint or data container checkpoint.

BaseMiniDataCheckpointer

Mini Data Checkpoint that uses pickle to create a checkpoint for a current id, and a data input or an expected output.

BaseSummaryCheckpointer

Summary Checkpointer to create a summary file that contains the list of all of the checkpoint current ids.

Checkpoint(all_checkpointers)

Resumable Checkpoint Step to load, and save both data checkpoints, and step checkpoints.

DataCheckpointType

An enumeration.

DefaultCheckpoint()

Checkpoint with pickle mini data checkpointers wrapped in a MiniDataCheckpointerWrapper, and the default step saving checkpointer.

MiniDataCheckpointerWrapper(…)

A BaseCheckpointer to checkpoint data inputs, and expected outputs with mini data checkpointers.

NullMiniDataCheckpointer

PickleMiniDataCheckpointer

Mini Data Checkpoint that uses pickle to create a pickle checkpoint file for a current id, and a data input or expected output.

StepSavingCheckpointer(execution_mode)

StepCheckpointer is used by the Checkpoint step to save the fitted steps contained in the context of type ExecutionContext.

TextFileSummaryCheckpointer

Summary Checkpointer that uses a txt file to create a summary file that contains the list of all of the checkpoint current ids.

class neuraxle.checkpoints.BaseCheckpointer(execution_mode: neuraxle.base.ExecutionMode)[source]

Base class to implement a step checkpoint or data container checkpoint.

Checkpoint uses many BaseCheckpointer to checkpoint both data container checkpoints, and step checkpoints.

BaseCheckpointer has an execution mode so there could be different checkpoints for each execution mode (fit, fit_transform or transform).

See also

is_for_execution_mode(execution_mode: neuraxle.base.ExecutionMode) → bool[source]

Returns true if the checkpointer should be used with the given execution mode.

Parameters

execution_mode (ExecutionMode) – execution mode (fit, fit_transform, or transform)

Returns

if the checkpointer should be used

Return type

bool

read_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Read the data container checkpoint with self.data_checkpointer. Returns a new data container loaded with all the data inputs, and expected outputs for each current id in the given data container.

Parameters
Returns

the data container checkpoint

Return type

neuraxle.data_container.DataContainer

save_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Save the data container or fitted step checkpoint with the given data container, and context. Returns the data container checkpoint, or latest data container.

Parameters
Returns

saved data container

Return type

neuraxle.data_container.DataContainer

should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → bool[source]
class neuraxle.checkpoints.BaseMiniDataCheckpointer[source]

Mini Data Checkpoint that uses pickle to create a checkpoint for a current id, and a data input or an expected output.

A mini data checkpointer must be wrapped with a MiniDataCheckpointerWrapper to be added to a Checkpoint data_checkpointers :

Checkpoint(
    all_checkpointers=[
        StepSavingCheckpointer(),
        MiniDataCheckpointerWrapper(
            summary_checkpointer=PickleSummaryCheckpointer(),
            data_input_checkpointer=PickleMiniDataCheckpointer(),
            expected_output_checkpointer=PickleMiniDataCheckpointer()
        )
    ]
)
checkpoint_exists(checkpoint_path, current_id) → bool[source]

Returns if checkpoint exists with the given path, and current id.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

read_checkpoint()[source]

Read data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

save_checkpoint(checkpoint_path, current_id, data)[source]

Save data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • current_id (str) – current id to checkpoint

  • data (Any) – data to checkpoint

Returns

class neuraxle.checkpoints.BaseSummaryCheckpointer[source]

Summary Checkpointer to create a summary file that contains the list of all of the checkpoint current ids.

A summary checkpointer must be wrapped with a MiniDataCheckpointerWrapper to be added to a Checkpoint data_checkpointers :

Checkpoint(
    all_checkpointers=[
        StepSavingCheckpointer(),
        MiniDataCheckpointerWrapper(
            summary_checkpointer=TextSummaryCheckpointer(),
            data_input_checkpointer=PickleMiniDataCheckpointer(),
            expected_output_checkpointer=PickleMiniDataCheckpointer()
        )
    ]
)
checkpoint_exists(checkpoint_path, data_container: neuraxle.data_container.DataContainer) → bool[source]

Returns if checkpoint exists with the given path, and current id.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

read_summary(checkpoint_path, data_container: neuraxle.data_container.DataContainer) → List[str][source]

Read data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

save_summary(checkpoint_path, data_container: neuraxle.data_container.DataContainer)[source]

Save data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • current_id (str) – current id to checkpoint

  • data (Any) – data to checkpoint

Returns

class neuraxle.checkpoints.Checkpoint(all_checkpointers: List[neuraxle.checkpoints.BaseCheckpointer] = None)[source]

Resumable Checkpoint Step to load, and save both data checkpoints, and step checkpoints. Checkpoint uses a list of step checkpointers(List[StepCheckpointer]), and data checkpointers(List[BaseCheckpointer]).

Data Checkpoints save the state of the data container (transformed data inputs, and expected outputs) for the current execution mode (fit, fit_transform, or transform).

Step Checkpoints save the state of the fitted steps before the checkpoint for the current execution mode (fit or fit_transform).

By default(no arguments specified), the Checkpoint step saves the step checkpoints for any fit or fit transform, and saves a different data checkpoint with pickle data container checkpointers :

Checkpoint(
    all_checkpointers=[
        StepSavingCheckpointer(),
        MiniDataCheckpointerWrapper(
            data_input_checkpointer=PickleMiniDataCheckpointer(),
            expected_output_checkpointer=PickleMiniDataCheckpointer()
        )
    ]
)

See also

  • BaseStep

  • ResumablePipeline._load_checkpoint()

  • ResumableStepMixin

  • NonFittableMixin

  • NonTransformableMixin

read_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Read data checkpoint for the current execution mode using self.data_checkpointers.

Parameters
  • data_container – data container to load checkpoint from

  • context – execution mode to load checkpoint from

Returns

loaded data container checkpoint

Return type

neuraxle.data_container.DataContainer

save_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext)[source]

Saves step, and data checkpointers for the current execution mode.

Parameters
  • data_container – data container for creating the data checkpoint

  • context – context for creating the step checkpoint

Returns

saved data container

Return type

neuraxle.data_container.DataContainer

should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → bool[source]

Returns True if all of the execution mode data checkpointers can be resumed.

Parameters
  • context – execution context

  • data_container – data to resume

Returns

if we can resume the checkpoint

Return type

bool

class neuraxle.checkpoints.DataCheckpointType[source]

An enumeration.

DATA_INPUT = 'di'[source]
EXPECTED_OUTPUT = 'eo'[source]
class neuraxle.checkpoints.DefaultCheckpoint[source]

Checkpoint with pickle mini data checkpointers wrapped in a MiniDataCheckpointerWrapper, and the default step saving checkpointer.

class neuraxle.checkpoints.MiniDataCheckpointerWrapper(summary_checkpointer: neuraxle.checkpoints.BaseSummaryCheckpointer, data_input_checkpointer: neuraxle.checkpoints.BaseMiniDataCheckpointer, expected_output_checkpointer: neuraxle.checkpoints.BaseMiniDataCheckpointer = None)[source]

A BaseCheckpointer to checkpoint data inputs, and expected outputs with mini data checkpointers.

MiniDataCheckpointerWrapper(
    data_input_checkpointer=PickleMiniDataCheckpointer(),
    expected_output_checkpointer=PickleMiniDataCheckpointer()
)
read_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Read data container data inputs checkpoint with data_input_checkpointer. Read data container expected outputs checkpoint with expected_output_checkpointer.

Parameters
Returns

data container checkpoint

Return type

neuraxle.data_container.DataContainer

save_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Save data container data inputs with data_input_checkpointer. Save data container expected outputs with expected_output_checkpointer.

Parameters
Returns

should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → bool[source]

Returns if the whole data container has been checkpointed.

Parameters
Returns

data container checkpoint

Return type

neuraxle.data_container.DataContainer

class neuraxle.checkpoints.NullMiniDataCheckpointer[source]
checkpoint_exists(checkpoint_path, current_id) → bool[source]

Returns if checkpoint exists with the given path, and current id.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

read_checkpoint()[source]

Read data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path to read

  • current_id (str) – current id to read checkpoint for

Returns

save_checkpoint(checkpoint_path, current_id, data)[source]

Save data checkpoint with the given current_id, and data.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • current_id (str) – current id to checkpoint

  • data (Any) – data to checkpoint

Returns

set_checkpoint_type(checkpoint_type: neuraxle.checkpoints.DataCheckpointType)[source]
class neuraxle.checkpoints.PickleMiniDataCheckpointer[source]

Mini Data Checkpoint that uses pickle to create a pickle checkpoint file for a current id, and a data input or expected output.

A mini data checkpointer must be wrapped with a MiniDataCheckpointerWrapper to be added to a Checkpoint data_checkpointers :

Checkpoint(
    all_checkpointers=[
        StepSavingCheckpointer(),
        MiniDataCheckpointerWrapper(
            summary_checkpointer=TextSummaryCheckpointer(),
            data_input_checkpointer=PickleMiniDataCheckpointer(),
            expected_output_checkpointer=PickleMiniDataCheckpointer()
        )
    ]
)
checkpoint_exists(checkpoint_path: str, current_id: str) → bool[source]

Get the checkpoint file path for a data input id.

Parameters
  • checkpoint_path (str) – checkpoint folder path

  • current_id (str) – checkpoint current id

Returns

path

Return type

str

get_checkpoint_filename_path_for_current_id(checkpoint_path: str, current_id: str) → str[source]

Get the checkpoint file path for a data input id.

Parameters
  • checkpoint_path (str) – checkpoint folder path

  • current_id (str) – checkpoint current id

Returns

path

Return type

str

read_checkpoint(checkpoint_path: str, current_id) → Any[source]

Read the data inputs, and expected outputs for the given current id using pickle.load.

Parameters
  • checkpoint_path (str) – checkpoint folder path

  • current_id (str) – checkpoint current id

Returns

tuple(current_id, checkpoint_data_input, checkpoint_expected_output)

Return type

Any

save_checkpoint(checkpoint_path: str, current_id, data)[source]

Save the given current id, data input, and expected output using pickle.dump.

Parameters
  • checkpoint_path (str) – checkpoint path

  • current_id (str) – checkpoint current id

  • data (Any) – data to checkpoint

Returns

set_checkpoint_type(checkpoint_type: neuraxle.checkpoints.DataCheckpointType)[source]

Set file name suffix for checkpoint.

Parameters

checkpoint_type (str) – checkpoint file name suffix

Returns

class neuraxle.checkpoints.StepSavingCheckpointer(execution_mode: neuraxle.base.ExecutionMode = <ExecutionMode.FIT_OR_FIT_TRANSFORM: 'fit_or_fit_transform'>)[source]

StepCheckpointer is used by the Checkpoint step to save the fitted steps contained in the context of type ExecutionContext.

By default, StepCheckpointer saves the fitted steps when the execution mode is either FIT, or FIT_TRANSFORM :

StepCheckpointer(ExecutionMode.FIT_OR_FIT_TRANSFORM)
# is equivalent to :
StepCheckpointer()
read_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Read the data container checkpoint with self.data_checkpointer. Returns a new data container loaded with all the data inputs, and expected outputs for each current id in the given data container.

Parameters
Returns

the data container checkpoint

Return type

neuraxle.data_container.DataContainer

save_checkpoint(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Save the data container or fitted step checkpoint with the given data container, and context. Returns the data container checkpoint, or latest data container.

Parameters
Returns

saved data container

Return type

neuraxle.data_container.DataContainer

should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → bool[source]
class neuraxle.checkpoints.TextFileSummaryCheckpointer[source]

Summary Checkpointer that uses a txt file to create a summary file that contains the list of all of the checkpoint current ids. A summary checkpoint file is a txt file that contains the list of all of the current ids of the checkpoint.

checkpoint_exists(checkpoint_path, data_container: neuraxle.data_container.DataContainer) → bool[source]

Returns true if the checkpoint summary file exists.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • data_container (DataContainer) – checkpoint data container

Returns

read_summary(checkpoint_path, data_container: neuraxle.data_container.DataContainer) → List[str][source]

Read current ids inside a summary checkpoint file.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • data_container (DataContainer) – checkpoint data container

Returns

checkpoint current ids

Return type

List[str]

save_summary(checkpoint_path, data_container: neuraxle.data_container.DataContainer)[source]

Save summary checkpoint file.

Parameters
  • checkpoint_path (str) – checkpoint path for saving

  • data_container (DataContainer) – checkpoint data container

Returns