neuraxle.base

Neuraxle’s Base Classes

This is the core of Neuraxle. Most pipeline steps derive (inherit) from those classes. They are worth noticing.

Classes

BaseHasher

Base class to hash hyperparamters, and data input ids together.

BaseSaver

Any saver must inherit from this one.

BaseStep(hyperparams, hyperparams_space, …)

Base class for a pipeline step.

EvaluableStepMixin

A step that can be evaluated with the scoring functions.

ExecutionContext(root, execution_mode, …)

Execution context object containing all of the pipeline hierarchy steps.

ExecutionMode

An enumeration.

ForceHandleMixin([cache_folder])

A step that automatically calls handle methods in the transform, fit, and fit_transform methods.

ForceHandleOnlyMixin([cache_folder])

A step that automatically calls handle methods in the transform, fit, and fit_transform methods.

FullDumpLoader(name[, stripped_saver])

Identity step that can load the full dump of a pipeline step.

HandleOnlyMixin

A pipeline step that only requires the implementation of handler methods :

HashlibMd5Hasher

Class to hash hyperparamters, and data input ids together using md5 algorithm from hashlib : https://docs.python.org/3/library/hashlib.html

Identity([savers, name])

A pipeline step that has no effect at all but to return the same data without changes.

JoblibStepSaver

Saver that can save, or load a step with joblib.load, and joblib.dump.

MetaStepJoblibStepSaver()

Custom saver for meta step mixin.

MetaStepMixin(wrapped)

A class to represent a step that wraps another step.

NonFittableMixin

A pipeline step that requires no fitting: fitting just returns self when called to do no action.

NonTransformableMixin

A pipeline step that has no effect at all but to return the same data without changes.

ResumableStepMixin

Mixin to add resumable function to a step, or a class that can be resumed, for example a checkpoint on disk.

TransformHandlerOnlyMixin

A pipeline step that only requires the implementation of _transform_data_container.

TruncableJoblibStepSaver()

Step saver for a TruncableSteps.

TruncableSteps(steps_as_tuple, BaseStep], …)

Step that contains multiple steps.

class neuraxle.base.BaseHasher[source]

Base class to hash hyperparamters, and data input ids together. The DataContainer class uses the hashed values for its current ids. BaseStep uses many BaseHasher objects to hash hyperparameters, and data inputs ids together after each transform.

See also

DataContainer

hash(current_ids: List[str], hyperparameters: neuraxle.hyperparams.space.HyperparameterSamples, data_inputs: Iterable[T_co]) → List[str][source]

Hash DataContainer.current_ids, data inputs, and hyperparameters together.

Parameters
  • current_ids – current hashed ids (can be None if this function has not been called yet)

  • hyperparameters – step hyperparameters to hash with current ids

  • data_inputs – data inputs to hash current ids for

Returns

the new hashed current ids

single_hash(current_id: str, hyperparameters: neuraxle.hyperparams.space.HyperparameterSamples) → List[str][source]

Hash summary id, and hyperparameters together.

Parameters
  • current_id – current hashed id

  • hyperparameters (HyperparameterSamples) – step hyperparameters to hash with current ids

Returns

the new hashed current id

class neuraxle.base.BaseSaver[source]

Any saver must inherit from this one. Some savers just save parts of objects, some save it all or what remains. Each :class`BaseStep` can potentially have multiple savers to make serialization possible.

See also

save(), load()

can_load(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext)[source]

Returns true if we can load the given step with the given execution context.

Parameters
  • step – step to load

  • context – execution context to load from

Returns

load_step(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Load step with execution context.

Parameters
  • step – step to load

  • context – execution context to load from

Returns

loaded base step

save_step(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Save step with execution context.

Parameters
  • step – step to save

  • context – execution context

  • save_savers

Returns

class neuraxle.base.BaseStep(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples = None, hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace = None, name: str = None, savers: List[neuraxle.base.BaseSaver] = None, hashers: List[neuraxle.base.BaseHasher] = None)[source]

Base class for a pipeline step.

Every step must implement :

If a step is not fittable, you can inherit from NonFittableMixin. If a step is not transformable, you can inherit from NonTransformableMixin. A step should only change its state inside fit() or fit_transform().

Example usage :

class MultiplyByN(NonFittableMixin, BaseStep):
    def __init__(self, multiply_by):
        NonFittableMixin.__init__(self)
        BaseStep.__init__(
            self,
            hyperparams=HyperparameterSamples({
                'multiply_by': multiply_by
            })
        )

    def transform(self, data_inputs):
        return data_inputs * self.hyperparams['multiply_by']

Every step can be saved using its savers of type BaseSaver. Some savers just save parts of objects, some save it all or what remains. Most step hash data inputs with hyperparams after every transformations to update the current ids inside the DataContainer.

Every step has handle methods that can be overridden to add side effects or change the execution flow based on the execution context, and the data container :

Every step has hyperparemeters, and hyperparameters spaces that can be set before the learning process begins. Hyperparameters can not only be passed in the constructor, but also be set by the pipeline that contains all of the steps :

pipeline = Pipeline([
    SomeStep()
])

pipeline.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.1,
    'SomeStep__learning_rate': 0.05
}))

Note

All heavy initialization logic should be done inside the setup method (e.g.: things inside GPU), and NOT in the constructor.

apply(method_name: str, step_name=None, *kargs, **kwargs) → Dict[KT, VT][source]

Apply a method to a step and its children.

Parameters
  • method_name – method name that need to be called on all steps

  • step_name – current pipeline step name

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

apply_method(method: Callable, step_name=None, *kargs, **kwargs) → Dict[KT, VT][source]

Apply a method to a step and its children.

Parameters
  • method – method to call with self

  • step_name – current pipeline step name

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

fit(data_inputs, expected_outputs=None) → neuraxle.base.BaseStep[source]

Fit step with the given data inputs, and expected outputs.

Parameters
  • data_inputs – data inputs

  • expected_outputs – expected outputs to fit on

Returns

fitted self

fit_transform(data_inputs, expected_outputs=None) -> ('BaseStep', typing.Any)[source]

Fit, and transform step with the given data inputs, and expected outputs.

Parameters
  • data_inputs – data inputs

  • expected_outputs – expected outputs to fit on

Returns

(fitted self, tranformed data inputs)

get_hyperparams() → neuraxle.hyperparams.space.HyperparameterSamples[source]

Get step hyperparameters as HyperparameterSamples.

Returns

step hyperparameters

get_hyperparams_space() → neuraxle.hyperparams.space.HyperparameterSpace[source]

Get step hyperparameters space.

Example :

step.get_hyperparams_space()
Returns

step hyperparams space

get_name() → str[source]

Get the name of the pipeline step.

Returns

the name, a string.

Note

A step name is the same value as the one in the keys of Pipeline.steps_as_tuple

get_params() → dict[source]

Get step hyperparameters as a flat primitive dict.

Example :

s.set_params(learning_rate=0.1)
hyperparams = s.get_params()
assert hyperparams == {"learning_rate": 0.1}
Returns

hyperparameters

get_savers() → List[neuraxle.base.BaseSaver][source]

Get the step savers of a pipeline step.

Returns

step savers

See also

BaseSaver

get_step_by_name(name)[source]
handle_fit(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Override this to add side effects or change the execution flow before (or after) calling fit(). The default behavior is to rehash current ids with the step hyperparameters.

Parameters
  • data_container – the data container to transform

  • context – execution context

Returns

tuple(fitted pipeline, data_container)

handle_fit_transform(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) -> ('BaseStep', <class 'neuraxle.data_container.DataContainer'>)[source]

Override this to add side effects or change the execution flow before (or after) calling * fit_transform(). The default behavior is to rehash current ids with the step hyperparameters.

Parameters
  • data_container – the data container to transform

  • context – execution context

Returns

tuple(fitted pipeline, data_container)

handle_inverse_transform(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Override this to add side effects or change the execution flow before (or after) calling inverse_transform(). The default behavior is to rehash current ids with the step hyperparameters.

Parameters
  • data_container – the data container to inverse transform

  • context – execution context

Returns

data_container

handle_predict(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Handle_transform in test mode.

Parameters
  • data_container – the data container to transform

  • context – execution context

Returns

transformed data container

handle_transform(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → neuraxle.data_container.DataContainer[source]

Override this to add side effects or change the execution flow before (or after) calling * transform(). The default behavior is to rehash current ids with the step hyperparameters.

Parameters
  • data_container – the data container to transform

  • context – execution context

Returns

transformed data container

hash(data_container: neuraxle.data_container.DataContainer) → List[str][source]

Hash data inputs, current ids, and hyperparameters together using self.hashers. This is used to create unique ids for the data checkpoints.

Parameters

data_container – data container

Returns

hashed current ids

See also

Checkpoint

hash_data_container(data_container)[source]

Hash data container using self.hashers.

  1. Hash current ids with hyperparams.

  2. Hash summary id with hyperparams.

Parameters

data_container – the data container to transform

Returns

transformed data container

invalidate() → neuraxle.base.BaseStep[source]

Invalidate step.

Returns

self

inverse_transform(processed_outputs)[source]

Inverse Transform the given transformed data inputs.

mutate() or reverse() can be called to change the default transform behavior :

p = Pipeline([MultiplyBy()])

_in = np.array([1, 2])

_out = p.transform(_in)

_regenerated_in = reversed(p).transform(_out)

assert np.array_equal(_regenerated_in, _in)
Parameters

processed_outputs – processed data inputs

Returns

inverse transformed processed outputs

load(context: neuraxle.base.ExecutionContext, full_dump=False) → neuraxle.base.BaseStep[source]

Load step using the execution context to create the directory of the saved step. Warning:

Parameters
  • context – execution context to load step from

  • full_dump – save full dump bool

Returns

loaded step

Warning

Please do not override this method because on loading it is an identity step that will load whatever step you coded.

meta_fit(X_train, y_train, metastep: neuraxle.base.MetaStepMixin)[source]

Uses a meta optimization technique (AutoML) to find the best hyperparameters in the given hyperparameter space.

Usage: p = p.meta_fit(X_train, y_train, metastep=RandomSearch(n_iter=10, scoring_function=r2_score, higher_score_is_better=True))

Call .mutate(new_method="inverse_transform", method_to_assign_to="transform"), and the current estimator will become

Parameters
  • X_train – data_inputs.

  • y_train – expected_outputs.

  • metastep – a metastep, that is, a step that can sift through the hyperparameter space of another estimator.

Returns

your best self.

mutate(new_method='inverse_transform', method_to_assign_to='transform', warn=True) → neuraxle.base.BaseStep[source]

Replace the “method_to_assign_to” method by the “new_method” method, IF the present object has no pending calls to .will_mutate_to() waiting to be applied. If there is a pending call, the pending call will override the methods specified in the present call. If the change fails (such as if the new_method doesn’t exist), then a warning is printed (optional). By default, there is no pending will_mutate_to call.

This could for example be useful within a pipeline to apply inverse_transform to every pipeline steps, or to assign predict_probas to predict, or to assign “inverse_transform” to “transform” to a reversed pipeline.

Parameters
  • new_method – the method to replace transform with, if there is no pending will_mutate_to call.

  • method_to_assign_to – the method to which the new method will be assigned to, if there is no pending will_mutate_to call.

  • warn – (verbose) wheter or not to warn about the inexistence of the method.

Returns

self, a copy of self, or even perhaps a new or different BaseStep object.

predict(data_input)[source]

Predict the expected output in test mode using func:~.transform, but by setting self to test mode first and then reverting the mode.

Parameters

data_input – data input to predict

Returns

prediction

reverse() → neuraxle.base.BaseStep[source]

The object will mutate itself such that the .transform method (and of all its underlying objects if applicable) be replaced by the .inverse_transform method.

Note: the reverse may fail if there is a pending mutate that was set earlier with .will_mutate_to.

Returns

a copy of self, reversed. Each contained object will also have been reversed if self is a pipeline.

save(context: neuraxle.base.ExecutionContext, full_dump=False) → neuraxle.base.BaseStep[source]

Save step using the execution context to create the directory to save the step into. The saving happens by looping through all of the step savers in the reversed order.

Some savers just save parts of objects, some save it all or what remains. The ExecutionContext.stripped_saver has to be called last because it needs a stripped version of the step.

Parameters
  • context – context to save from

  • full_dump – save full pipeline dump to be able to load everything without source code (false by default).

Returns

self

set_hyperparams(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Set the step hyperparameters.

Example :

step.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.10
}))
Parameters

hyperparams – hyperparameters

Returns

self

set_hyperparams_space(hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace) → neuraxle.base.BaseStep[source]

Set step hyperparameters space.

Example :

step.set_hyperparams_space(HyperparameterSpace({
    'hp': RandInt(0, 10)
}))
Parameters

hyperparams_space – hyperparameters space

Returns

self

set_name(name: str)[source]

Set the name of the pipeline step.

Parameters

name – a string.

Returns

self

Note

A step name is the same value as the one in the keys of steps_as_tuple

set_params(**params) → neuraxle.base.BaseStep[source]

Set step hyperparameters with a dictionary.

Example :

s.set_params(learning_rate=0.1)
hyperparams = s.get_params()
assert hyperparams == {"learning_rate": 0.1}
Parameters

**params

arbitrary number of arguments for hyperparameters

set_savers(savers: List[neuraxle.base.BaseSaver]) → neuraxle.base.BaseStep[source]

Set the step savers of a pipeline step.

Returns

self

See also

BaseSaver

set_train(is_train: bool = True)[source]

This method overrides the method of BaseStep to also consider the wrapped step as well as self. Set pipeline step mode to train or test.

Parameters

is_train – is training mode or not

Returns

setup() → neuraxle.base.BaseStep[source]

Initialize the step before it runs. Only from here and not before that heavy things should be created (e.g.: things inside GPU), and NOT in the constructor.

The setup method is called for each step before any fit, or fit_transform.

Returns

self

should_save() → bool[source]

Returns true if the step should be saved. If the step has been initialized and invalidated, then it must be saved.

A step is invalidated when any of the following things happen :
  • a mutation has been performed on the step : func:~.mutate

  • an hyperparameter has changed func:~.set_hyperparams

  • an hyperparameter space has changed func:~.set_hyperparams_space

  • a call to the fit method func:~.handle_fit

  • a call to the fit_transform method func:~.handle_fit_transform

  • the step name has changed func:~neuraxle.base.BaseStep.set_name

Returns

if the step should be saved

summary_hash(data_container: neuraxle.data_container.DataContainer) → str[source]

Hash data inputs, current ids, and hyperparameters together using self.hashers. This is used to create unique ids for the data checkpoints.

Parameters

data_container – data container

Returns

hashed current ids

See also

Checkpoint

teardown() → neuraxle.base.BaseStep[source]

Teardown step after program execution. Inverse of setup, and it should clear memory. Override this method if you need to clear memory.

Returns

self

tosklearn()[source]
transform(data_inputs)[source]

Transform given data inputs.

Parameters

data_inputs – data inputs

Returns

transformed data inputs

update_hyperparams(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Update the step hyperparameters without removing the already-set hyperparameters. This can be useful to add more hyperparameters to the existing ones without flushing the ones that were already set.

Example :

step.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.10
    'weight_decay': 0.001
}))

step.update_hyperparams(HyperparameterSamples({
    'learning_rate': 0.01
}))

assert step.get_hyperparams()['learning_rate'] == 0.01
assert step.get_hyperparams()['weight_decay'] == 0.001
Parameters

hyperparams – hyperparameters

Returns

self

update_hyperparams_space(hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace) → neuraxle.base.BaseStep[source]

Update the step hyperparameter spaces without removing the already-set hyperparameters. This can be useful to add more hyperparameter spaces to the existing ones without flushing the ones that were already set.

Example :

step.set_hyperparams_space(HyperparameterSpace({
    'learning_rate': LogNormal(0.5, 0.5)
    'weight_decay': LogNormal(0.001, 0.0005)
}))

step.update_hyperparams_space(HyperparameterSpace({
    'learning_rate': LogNormal(0.5, 0.1)
}))

assert step.get_hyperparams_space()['learning_rate'] == LogNormal(0.5, 0.1)
assert step.get_hyperparams_space()['weight_decay'] == LogNormal(0.001, 0.0005)
Parameters

hyperparams_space – hyperparameters space

Returns

self

will_mutate_to(new_base_step: Optional[neuraxle.base.BaseStep] = None, new_method: str = None, method_to_assign_to: str = None) → neuraxle.base.BaseStep[source]

This will change the behavior of self.mutate(<...>) such that when mutating, it will return the presently provided new_base_step BaseStep (can be left to None for self), and the .mutate method will also apply the new_method and the method_to_affect, if they are not None, and after changing the object to new_base_step.

This can be useful if your pipeline requires unsupervised pretraining. For example:

X_pretrain = ...
X_train = ...

p = Pipeline(
    SomePreprocessing(),
    SomePretrainingStep().will_mutate_to(new_base_step=SomeStepThatWillUseThePretrainingStep),
    Identity().will_mutate_to(new_base_step=ClassifierThatWillBeUsedOnlyAfterThePretraining)
)
# Pre-train the pipeline
p = p.fit(X_pretrain, y=None)

# This will leave `SomePreprocessing()` untouched and will affect the two other steps.
p = p.mutate(new_method="transform", method_to_affect="transform")

# Pre-train the pipeline
p = p.fit(X_train, y_train)  # Then fit the classifier and other new things
Parameters
  • new_base_step – if it is not None, upon calling mutate, the object it will mutate to will be this provided new_base_step.

  • method_to_assign_to – if it is not None, upon calling mutate, the method_to_affect will be the one that is used on the provided new_base_step.

  • new_method – if it is not None, upon calling mutate, the new_method will be the one that is used on the provided new_base_step.

Returns

self

class neuraxle.base.EvaluableStepMixin[source]

A step that can be evaluated with the scoring functions.

See also

BaseStep

get_score()[source]
class neuraxle.base.ExecutionContext(root: str = '/Users/alexandre/Documents/www.neuraxle.org-builder/docs/cache', execution_mode: neuraxle.base.ExecutionMode = None, stripped_saver: neuraxle.base.BaseSaver = None, parents=None)[source]

Execution context object containing all of the pipeline hierarchy steps. First item in execution context parents is root, second is nested, and so on. This is like a stack.

The execution context is used for fitted step saving, and caching :
copy()[source]
empty()[source]

Return True if the context has parent steps.

Returns

if parents len is 0

get_execution_mode() → neuraxle.base.ExecutionMode[source]
get_names()[source]

Returns a list of the parent names.

Returns

list of parents step names

get_path(is_absolute: bool = True)[source]

Creates the directory path for the current execution context.

Parameters

is_absolute – bool to say if we want to add root to the path or not

Returns

current context path

load(path: str) → neuraxle.base.BaseStep[source]

Load full dump at the given path.

Parameters

path – pipeline step path

Returns

loaded step

mkdir()[source]

Creates the directory to save the last parent step.

Returns

peek() → neuraxle.base.BaseStep[source]

Get last parent.

Returns

the last parent base step

pop() → bool[source]

Pop the context. Returns True if it successfully popped an item from the parents list.

Returns

if an item has been popped

pop_item() → neuraxle.base.BaseStep[source]

Change the execution context to be the same as the latest parent context.

Returns

push(step: neuraxle.base.BaseStep) → neuraxle.base.ExecutionContext[source]

Pushes a step in the parents of the execution context.

Parameters

step – step to add to the execution context

Returns

self

save(full_dump=False)[source]

Save all unsaved steps in the parents of the execution context using save(). This method is called from a step checkpointer inside a Checkpoint.

Parameters

full_dump – save full pipeline dump to be able to load everything without source code (false by default).

Returns

See also

BaseStep, save()

save_last()[source]

Save only the last step in the execution context.

See also

save()

should_save_last_step() → bool[source]

Returns True if the last step should be saved.

Returns

if the last step should be saved

to_identity() → neuraxle.base.ExecutionContext[source]

Create a fake execution context containing only identity steps. Create the parents by using the path of the current execution context.

Returns

fake identity execution context

class neuraxle.base.ExecutionMode[source]

An enumeration.

FIT = 'fit'[source]
FIT_OR_FIT_TRANSFORM = 'fit_or_fit_transform'[source]
FIT_OR_FIT_TRANSFORM_OR_TRANSFORM = 'fit_or_fit_transform_or_transform'[source]
FIT_TRANSFORM = 'fit_transform'[source]
INVERSE_TRANSFORM = 'inverse_transform'[source]
TRANSFORM = 'transform'[source]
class neuraxle.base.ForceHandleMixin(cache_folder=None)[source]

A step that automatically calls handle methods in the transform, fit, and fit_transform methods.

fit(data_inputs, expected_outputs=None) → neuraxle.base.HandleOnlyMixin[source]

Using handle_fit(), fit step with the given data inputs, and expected outputs.

Parameters

data_inputs – data inputs

Returns

fitted self

fit_transform(data_inputs, expected_outputs=None) → Tuple[neuraxle.base.HandleOnlyMixin, Iterable[T_co]][source]

Using handle_fit_transform(), fit and transform step with the given data inputs, and expected outputs.

Parameters

data_inputs – data inputs

Returns

fitted self, outputs

transform(data_inputs) → Iterable[T_co][source]

Using handle_transform(), transform data inputs.

Parameters

data_inputs – data inputs

Returns

outputs

class neuraxle.base.ForceHandleOnlyMixin(cache_folder=None)[source]

A step that automatically calls handle methods in the transform, fit, and fit_transform methods. It also requires the implementation of handler methods :

  • _transform_data_container

  • _fit_transform_data_container

  • _fit_data_container

class neuraxle.base.FullDumpLoader(name, stripped_saver=None)[source]

Identity step that can load the full dump of a pipeline step. Used by load().

Usage example:

saved_step = FullDumpLoader(
    name=path,
    stripped_saver=self.stripped_saver
).load(context_for_loading, True)
load(context: neuraxle.base.ExecutionContext, full_dump=True) → neuraxle.base.BaseStep[source]

Load the full dump of a pipeline step.

Parameters
  • context – execution context

  • full_dump – load full dump or not (always true, inherited from BaseStep

Returns

loaded step

class neuraxle.base.HandleOnlyMixin[source]
A pipeline step that only requires the implementation of handler methods :
  • _transform_data_container

  • _fit_transform_data_container

  • _fit_data_container

If forbids only implementing fit or transform or fit_transform without the handles. So it forces the handles.

fit(data_inputs, expected_outputs=None) → neuraxle.base.HandleOnlyMixin[source]
fit_transform(data_inputs, expected_outputs=None) → neuraxle.base.HandleOnlyMixin[source]
transform(data_inputs) → neuraxle.base.HandleOnlyMixin[source]
class neuraxle.base.HashlibMd5Hasher[source]

Class to hash hyperparamters, and data input ids together using md5 algorithm from hashlib : https://docs.python.org/3/library/hashlib.html

The DataContainer class uses the hashed values for its current ids. BaseStep uses many BaseHasher objects to hash hyperparameters, and data inputs ids together after each transform.

hash(current_ids, hyperparameters, data_inputs: Any = None) → List[str][source]

Hash DataContainer.current_ids, data inputs, and hyperparameters together using hashlib.md5

Parameters
  • current_ids – current hashed ids (can be None if this function has not been called yet)

  • hyperparameters – step hyperparameters to hash with current ids

  • data_inputs – data inputs to hash current ids for

Returns

the new hashed current ids

single_hash(current_id: str, hyperparameters: neuraxle.hyperparams.space.HyperparameterSamples) → List[str][source]

Hash summary id, and hyperparameters together.

Parameters
  • current_id – current hashed id

  • hyperparameters – step hyperparameters to hash with current ids

Returns

the new hashed current id

class neuraxle.base.Identity(savers=None, name=None)[source]

A pipeline step that has no effect at all but to return the same data without changes.

This can be useful to concatenate new features to existing features, such as what AddFeatures do.

Identity inherits from NonTransformableMixin and from NonFittableMixin which makes it a class that has no effect in the pipeline: it doesn’t require fitting, and at transform-time, it returns the same data it received.

class neuraxle.base.JoblibStepSaver[source]

Saver that can save, or load a step with joblib.load, and joblib.dump.

This saver is a good default saver when the object is already stripped out of things that would make it unserializable.

It is the default stripped_saver for the ExecutionContext. The stripped saver is the first to load the step, and the last to save the step. The saver receives a stripped version of the step so that it can be saved by joblib.

can_load(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext) → bool[source]

Returns true if the given step has been saved with the given execution context.

Parameters
  • step – step that might have been saved

  • context – execution context

Returns

if we can load the step with the given context

load_step(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Load stripped step.

Parameters
  • step – stripped step to load

  • context – execution context to load from

Returns

save_step(step: neuraxle.base.BaseStep, context: neuraxle.base.ExecutionContext) → neuraxle.base.BaseStep[source]

Saved step stripped out of things that would make it unserializable.

Parameters
  • step – stripped step to save

  • context – execution context to save from

Returns

class neuraxle.base.MetaStepJoblibStepSaver[source]

Custom saver for meta step mixin.

load_step(step: neuraxle.base.MetaStepMixin, context: neuraxle.base.ExecutionContext) → neuraxle.base.MetaStepMixin[source]

Load MetaStepMixin.

  1. Loop through all of the sub steps savers, and only load the sub steps that have been saved.

  2. Refresh steps

Parameters
  • step – step to load

  • context – execution context

Returns

loaded truncable steps

save_step(step: neuraxle.base.MetaStepMixin, context: neuraxle.base.ExecutionContext) → neuraxle.base.MetaStepMixin[source]

Save MetaStepMixin.

  1. Save wrapped step.

  2. Strip wrapped step form the meta step mixin.

  3. Save meta step with wrapped step savers.

Parameters
  • step – meta step to save

  • context – execution context

Returns

class neuraxle.base.MetaStepMixin(wrapped: neuraxle.base.BaseStep = None)[source]

A class to represent a step that wraps another step. It can be used for many things.

For example, ForEachDataInput adds a loop before any calls to the wrapped step :

class ForEachDataInput(MetaStepMixin, BaseStep):
    def __init__(
        self,
        wrapped: BaseStep
    ):
        BaseStep.__init__(self)
        MetaStepMixin.__init__(self, wrapped)

    def fit(self, data_inputs, expected_outputs=None):
        if expected_outputs is None:
            expected_outputs = [None] * len(data_inputs)

        for di, eo in zip(data_inputs, expected_outputs):
            self.wrapped = self.wrapped.fit(di, eo)

        return self

    def transform(self, data_inputs):
        outputs = []
        for di in data_inputs:
            output = self.wrapped.transform(di)
            outputs.append(output)

    return outputs

    def fit_transform(self, data_inputs, expected_outputs=None):
        if expected_outputs is None:
            expected_outputs = [None] * len(data_inputs)

        outputs = []
        for di, eo in zip(data_inputs, expected_outputs):
            self.wrapped, output = self.wrapped.fit_transform(di, eo)
        outputs.append(output)

        return self, outputs
apply(method_name: str, step_name=None, *kargs, **kwargs) → Dict[KT, VT][source]

Apply the method name to the meta step and its wrapped step.

Parameters
  • method_name – method name that need to be called on all steps

  • step_name – step name to apply the method to

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

apply_method(method: Callable, step_name=None, *kargs, **kwargs) → Union[Dict[KT, VT], Iterable[T_co]][source]

Apply method to the meta step and its wrapped step.

Parameters
  • method – method to call with self

  • step_name – step name to apply the method to

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

fit(data_inputs, expected_outputs=None)[source]
fit_transform(data_inputs, expected_outputs=None)[source]
get_best_model() → neuraxle.base.BaseStep[source]
get_hyperparams() → neuraxle.hyperparams.space.HyperparameterSamples[source]

Get step hyperparameters as HyperparameterSamples with flattened hyperparams.

Returns

step hyperparameters

get_hyperparams_space() → neuraxle.hyperparams.space.HyperparameterSpace[source]

Get meta step and wrapped step hyperparams as a flat hyperparameter space

Returns

hyperparameters_space

get_step() → neuraxle.base.BaseStep[source]

Get wrapped step

Returns

self.wrapped

get_step_by_name(name)[source]
handle_fit_transform(data_container, context)[source]
handle_transform(data_container, context)[source]
inverse_transform(data_inputs)[source]
mutate(new_method='inverse_transform', method_to_assign_to='transform', warn=True) → neuraxle.base.BaseStep[source]

Mutate self, and self.wrapped. Please refer to mutate() for more information.

Parameters
  • new_method – the method to replace transform with, if there is no pending will_mutate_to call.

  • method_to_assign_to – the method to which the new method will be assigned to, if there is no pending will_mutate_to call.

  • warn – (verbose) wheter or not to warn about the inexistence of the method.

Returns

self, a copy of self, or even perhaps a new or different BaseStep object.

resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext)[source]
set_hyperparams(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Set step hyperparameters, and wrapped step hyperparams with the given hyperparams.

Example :

step.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.10
    'wrapped__learning_rate': 0.10 # this will set the wrapped step 'learning_rate' hyperparam
}))
Parameters

hyperparams – hyperparameters

Returns

self

set_hyperparams_space(hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace) → neuraxle.base.BaseStep[source]

Set meta step and wrapped step hyperparams space using the given hyperparams space.

Parameters

hyperparams_space – ordered dict containing all hyperparameter spaces

Returns

self

set_step(step: neuraxle.base.BaseStep) → neuraxle.base.BaseStep[source]

Set wrapped step to the given step.

Parameters

step – new wrapped step

Returns

self

set_train(is_train: bool = True)[source]

Set pipeline step mode to train or test. Also set wrapped step mode to train or test.

For instance, you can add a simple if statement to direct to the right implementation:

def transform(self, data_inputs):
    if self.is_train:
        self.transform_train_(data_inputs)
    else:
        self.transform_test_(data_inputs)

def fit_transform(self, data_inputs, expected_outputs=None):
    if self.is_train:
        self.fit_transform_train_(data_inputs, expected_outputs)
    else:
        self.fit_transform_test_(data_inputs, expected_outputs)
Parameters

is_train – bool

Returns

setup() → neuraxle.base.BaseStep[source]

Initialize step before it runs. Also initialize the wrapped step.

Returns

self

should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext)[source]
teardown() → neuraxle.base.BaseStep[source]

Teardown step. Also teardown the wrapped step.

Returns

self

transform(data_inputs)[source]
update_hyperparams(hyperparams: neuraxle.hyperparams.space.HyperparameterSamples) → neuraxle.base.BaseStep[source]

Update the step, and the wrapped step hyperparams without removing the already set hyperparameters. Please refer to update_hyperparams().

Parameters

hyperparams – hyperparameters

Returns

self

update_hyperparams_space(hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace) → neuraxle.base.BaseStep[source]

Update the step, and the wrapped step hyperparams without removing the already set hyperparameters. Please refer to update_hyperparams().

Parameters

hyperparams_space – hyperparameters

Returns

self

will_mutate_to(new_base_step: Optional[neuraxle.base.BaseStep] = None, new_method: str = None, method_to_assign_to: str = None) → neuraxle.base.BaseStep[source]

Add pending mutate self, self.wrapped. Please refer to will_mutate_to() for more information.

Parameters
  • new_base_step – if it is not None, upon calling mutate, the object it will mutate to will be this provided new_base_step.

  • method_to_assign_to – if it is not None, upon calling mutate, the method_to_affect will be the one that is used on the provided new_base_step.

  • new_method – if it is not None, upon calling mutate, the new_method will be the one that is used on the provided new_base_step.

Returns

self

class neuraxle.base.NonFittableMixin[source]

A pipeline step that requires no fitting: fitting just returns self when called to do no action. Note: fit methods are not implemented

fit(data_inputs, expected_outputs=None) → neuraxle.base.NonFittableMixin[source]

Don’t fit.

Parameters
  • data_inputs – the data that would normally be fitted on.

  • expected_outputs – the data that would normally be fitted on.

Returns

self

handle_fit_transform(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext)[source]
class neuraxle.base.NonTransformableMixin[source]

A pipeline step that has no effect at all but to return the same data without changes. Transform method is automatically implemented as changing nothing.

Example :

class PrintOnFit(NonTransformableMixin, BaseStep):
    def __init__(self):
        BaseStep.__init__(self)

    def fit(self, data_inputs, expected_outputs=None) -> 'FitCallbackStep':
        print((data_inputs, expected_outputs))
        return self

Note

fit methods are not implemented

inverse_transform(processed_outputs)[source]

Do nothing - return the same data.

Parameters

processed_outputs – the data to process

Returns

the processed_outputs, unchanged.

transform(data_inputs)[source]

Do nothing - return the same data.

Parameters

data_inputs – the data to process

Returns

the data_inputs, unchanged.

class neuraxle.base.ResumableStepMixin[source]

Mixin to add resumable function to a step, or a class that can be resumed, for example a checkpoint on disk.

resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext)[source]
should_resume(data_container: neuraxle.data_container.DataContainer, context: neuraxle.base.ExecutionContext) → bool[source]

Returns True if a step can be resumed with the given the data container, and execution context. See Checkpoint class documentation for more details on how a resumable checkpoint works.

Parameters
  • data_container – data container to resume from

  • context – execution context to resume from

Returns

if we can resume

class neuraxle.base.TransformHandlerOnlyMixin[source]

A pipeline step that only requires the implementation of _transform_data_container.

transform(data_inputs) → neuraxle.base.HandleOnlyMixin[source]
class neuraxle.base.TruncableJoblibStepSaver[source]

Step saver for a TruncableSteps. TruncableJoblibStepSaver saves, and loads all of the sub steps using their savers.

load_step(step: neuraxle.base.TruncableSteps, context: neuraxle.base.ExecutionContext) → neuraxle.base.TruncableSteps[source]
  1. Loop through all of the sub steps savers, and only load the sub steps that have been saved.

  2. Refresh steps

Parameters
  • step – step to load

  • context – execution context

Returns

loaded truncable steps

save_step(step: neuraxle.base.TruncableSteps, context: neuraxle.base.ExecutionContext)[source]
  1. Loop through all the steps, and save the ones that need to be saved.

  2. Add a new property called sub step savers inside truncable steps to be able to load sub steps when loading.

  3. Strip steps from truncable steps at the end.

Parameters
  • step – step to save

  • context – execution context

Returns

class neuraxle.base.TruncableSteps(steps_as_tuple: List[Union[Tuple[str, BaseStep], BaseStep]], hyperparams: neuraxle.hyperparams.space.HyperparameterSamples = {}, hyperparams_space: neuraxle.hyperparams.space.HyperparameterSpace = {})[source]

Step that contains multiple steps. Pipeline inherits form this class. It is possible to truncate this step * __getitem__()

  • self.steps contains the actual steps

  • self.steps_as_tuple contains a list of tuple of step name, and step

append(item: Tuple[str, BaseStep]) → neuraxle.base.TruncableSteps[source]

Add an item to steps as tuple.

Parameters

item – item tuple (step name, step)

Returns

self

apply(method_name: str, step_name=None, *kargs, **kwargs) → Dict[KT, VT][source]

Apply the method name to the pipeline step and all of its children.

Parameters
  • method_name – method name that need to be called on all steps

  • step_name – current pipeline step name

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

apply_method(method: Callable, step_name=None, *kargs, **kwargs) → Dict[KT, VT][source]

Apply a method to the pipeline step and all of its children.

Parameters
  • method – method to call with self

  • step_name – current pipeline step name

  • kargs – any additional arguments to be passed to the method

  • kwargs – any additional positional arguments to be passed to the method

Returns

accumulated results

are_steps_before_index_the_same(other: neuraxle.base.TruncableSteps, index: int) → bool[source]

Returns true if self.steps before index are the same as other.steps before index.

Parameters
  • other – other truncable steps to compare

  • index – max step index to compare

Returns

bool

ends_with(step_type: type)[source]

Returns true if truncable steps end with a step of the given type.

Parameters

step_type – step type

Returns

if truncable steps ends with the given step type

get_hyperparams() → neuraxle.hyperparams.space.HyperparameterSamples[source]

Get step hyperparameters as HyperparameterSamples.

Example :

p = Pipeline([SomeStep()])
p.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.1,
    'some_step__learning_rate': 0.2 # will set SomeStep() hyperparam 'learning_rate' to 0.2
}))

hp = p.get_hyperparams()
# hp ==>  { 'learning_rate': 0.1, 'some_step__learning_rate': 0.2 }
Returns

step hyperparameters

get_hyperparams_space()[source]

Get step hyperparameters space as HyperparameterSpace.

Example :

p = Pipeline([SomeStep()])
p.set_hyperparams_space(HyperparameterSpace({
    'learning_rate': RandInt(0,5),
    'some_step__learning_rate': RandInt(0, 10) # will set SomeStep() 'learning_rate' hyperparam space to RandInt(0, 10)
}))

hp = p.get_hyperparams_space()
# hp ==>  { 'learning_rate': RandInt(0,5), 'some_step__learning_rate': RandInt(0,10) }
Returns

step hyperparameters space

get_step_by_name(name)[source]
items() → ItemsView[KT, VT_co][source]

Returns all of the steps as tuples items (step_name, step).

Returns

step items tuple : (step name, step)

keys() → KeysView[KT][source]

Returns the step names.

Returns

list of step names

mutate(new_method='inverse_transform', method_to_assign_to='transform', warn=True) → neuraxle.base.BaseStep[source]

Call mutate on every steps the the present truncable step contains.

Parameters
  • new_method – the method to replace transform with.

  • method_to_assign_to – the method to which the new method will be assigned to.

  • warn – (verbose) wheter or not to warn about the inexistence of the method.

Returns

self, a copy of self, or even perhaps a new or different BaseStep object.

pop() → neuraxle.base.BaseStep[source]

Pop the last step.

Returns

last step

popfront() → neuraxle.base.BaseStep[source]

Pop the first step.

Returns

first step

popfrontitem() → Tuple[str, neuraxle.base.BaseStep][source]

Pop the first step.

Returns

first step item

popitem(key=None) → Tuple[str, neuraxle.base.BaseStep][source]

Pop the last step, or the step with the given key

Parameters

key – step name to pop, or None

Returns

last step item

set_hyperparams(hyperparams: Union[neuraxle.hyperparams.space.HyperparameterSamples, collections.OrderedDict, dict]) → neuraxle.base.BaseStep[source]

Set step hyperparameters to the given HyperparameterSamples.

Example :

p = Pipeline([SomeStep()])
p.set_hyperparams(HyperparameterSamples({
    'learning_rate': 0.1,
    'some_step__learning_rate': 0.2 # will set SomeStep() hyperparam 'learning_rate' to 0.2
}))
Returns

step hyperparameters

set_hyperparams_space(hyperparams_space: Union[neuraxle.hyperparams.space.HyperparameterSpace, collections.OrderedDict, dict]) → neuraxle.base.BaseStep[source]

Set step hyperparameters space as HyperparameterSpace.

Example :

p = Pipeline([SomeStep()])
p.set_hyperparams_space(HyperparameterSpace({
    'learning_rate': RandInt(0,5),
    'some_step__learning_rate': RandInt(0, 10) # will set SomeStep() 'learning_rate' hyperparam space to RandInt(0, 10)
}))
Parameters

hyperparams_space – hyperparameters space

Returns

self

set_steps(steps_as_tuple: List[Union[Tuple[str, BaseStep], BaseStep]])[source]

Set steps as tuple.

Parameters

steps_as_tuple – list of tuple containing step name and step

Returns

set_train(is_train: bool = True) → neuraxle.base.BaseStep[source]

Set pipeline step mode to train or test.

In the pipeline steps functions, you can add a simple if statement to direct to the right implementation:

def transform(self, data_inputs):
    if self.is_train:
        self.transform_train_(data_inputs)
    else:
        self.transform_test_(data_inputs)

def fit_transform(self, data_inputs, expected_outputs):
    if self.is_train:
        self.fit_transform_train_(data_inputs, expected_outputs)
    else:
        self.fit_transform_test_(data_inputs, expected_outputs)
Parameters

is_train – if the step is in train mode (True) or test mode (False)

Returns

self

setup() → neuraxle.base.BaseStep[source]

Initialize step before it runs.

Returns

self

should_save()[source]

Returns if the step needs to be saved or not. If self should be saved or any of his sub steps, return True.

Returns

split(step_type: type) → List[neuraxle.base.TruncableSteps][source]

Split truncable steps by a step class name.

Parameters

step_type – step class type to split from.

Returns

list of truncable steps containing the splitted steps

teardown() → neuraxle.base.BaseStep[source]

Teardown step after program execution. Teardowns all of the sub steps as well.

Returns

self

update_hyperparams(hyperparams: Union[neuraxle.hyperparams.space.HyperparameterSamples, collections.OrderedDict, dict]) → neuraxle.base.BaseStep[source]

Update the steps hyperparameters without removing the already-set hyperparameters. Please refer to update_hyperparams().

Parameters

hyperparams – hyperparams to update

Returns

step

update_hyperparams_space(hyperparams_space: Union[neuraxle.hyperparams.space.HyperparameterSpace, collections.OrderedDict, dict]) → neuraxle.base.BaseStep[source]

Update the steps hyperparameters without removing the already-set hyperparameters. Please refer to update_hyperparams().

Parameters

hyperparams_space – hyperparams_space to update

Returns

step

See also

update_hyperparams(), HyperparameterSamples

values() → ValuesView[VT_co][source]

Get step values.

Returns

all of the steps