merlion.transform package

This package provides a number of useful data pre-processing transforms. Each transform is a callable object that inherits either from TransformBase or InvertibleTransformBase.

We will introduce the key features of transform objects using the Rescale class. You may initialize a transform in three ways:

from merlion.transform.factory import TransformFactory
from merlion.transform.normalize import Rescale

# Use the initializer
transform = Rescale(bias=5.0, scale=3.2)

# Use the class's from_dict() method with the arguments you would normally
# give to the initializer
kwargs = dict(bias=5.0, scale=3.2)
transform = Rescale.from_dict(kwargs)

# Use the TransformFactory with the class's name, and the keyword arguments
# you would normally give to the inializer
transform = TransformFactory.create("Rescale", **kwargs)

After initializing a transform, one may use it as follows:

transform.train(time_series)              # set any trainable params
transformed = transform(time_series)      # apply the transform to the time series
inverted = transform.invert(transformed)  # invert the transform
state_dict = transform.to_dict()          # serialize to a JSON-compatible dict

Note that transform.invert() is supported even if the transform doesn’t inherit from InvertibleTransformBase! In this case, transform.invert() implements a pseudo-inverse that may not recover the original time_series exactly. Additionally, the dict returned by transform.to_dict() is exactly the same as the dict expected by the class method TransformCls.from_dict().

`factory`	Contains the `TransformFactory` for instantiating transforms.
`base`	Transform base classes and the `Identity` transform.
`bound`	Transforms that clip the input.
`moving_average`	Transforms that compute moving averages and k-step differences.
`normalize`	Transforms that rescale the input or otherwise normalize it.
`resample`	Transforms that resample the input in time, or stack adjacent observations into vectors.
`sequence`	Classes to compose (`TransformSequence`) or stack (`TransformStack`) multiple transforms.
`anomalize`	Transforms that inject synthetic anomalies into time series.

Submodules

merlion.transform.base module

Transform base classes and the Identity transform.

class merlion.transform.base.TransformBase

Bases: object

Abstract class for a callable data pre-processing transform.

Subclasses must override the train method (pass if no training is required) and __call__ method (to implement the actual transform).

Subclasses may also support a pseudo inverse transform (possibly using the implementation-specific self.inversion_state, which should be set in __call__). If an inversion state is not required, override the property requires_inversion_state to return False.

Due to possible information loss in the forward pass, the inverse transform may be not be perfect/proper, and calling TransformBase.invert will result in a warning. By default, the inverse transform (implemented in TransformBase._invert) is just the identity.

Variables: inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if TransformBase.requires_inversion_state is True. The inversion state is destroyed upon calling TransformBase.invert, unless the option the option retain_inversion_state=True is specified. This is to prevent potential user error.

_invert(time_series)

Helper method which actually performs the inverse transform (when possible).

Parameters: time_series (TimeSeries) – Time series to apply the inverse transform to
Return type: TimeSeries
Returns: The (inverse) transformed time series.

property proper_inversion: TransformBase objects do not support a proper inversion.

property requires_inversion_state: Indicates whether any state self.inversion_state is required to invert the transform. Specific to each transform. True by default.

property identity_inversion: Indicates whether the inverse applied by this transform is just the identity.

to_dict()

classmethod from_dict(state)

abstract train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

invert(time_series, retain_inversion_state=False)

Applies the inverse of this transform on the time series.

Parameters

time_series (TimeSeries) – The time series on which to apply the inverse transform.
retain_inversion_state – If an inversion state is required, supply retain_inversion_state=True to retain the inversion state even after calling this method. Otherwise, the inversion state will be set to None after the inversion is applied, to prevent a user error of accidentally using a stale state.

Return type

TimeSeries

Returns

The (inverse) transformed time series.

class merlion.transform.base.InvertibleTransformBase

Bases: TransformBase

Abstract class for a callable data pre-processing transform with a proper inverse.

In addition to overriding the train and __call__ methods, subclasses must also override the InvertibleTransformBase._invert method to implement the actual inverse transform.

Variables: inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if TransformBase.requires_inversion_state is True. The inversion state is destroyed upon calling TransformBase.invert, unless the option the option retain_inversion_state=True is specified. This is to prevent potential user error.

abstract _invert(time_series)

Helper method which actually performs the inverse transform (when possible).

Parameters: time_series (TimeSeries) – Time series to apply the inverse transform to
Return type: TimeSeries
Returns: The (inverse) transformed time series.

property proper_inversion: InvertibleTransformBase always supports a proper inversion.

property identity_inversion: Indicates whether the inverse applied by this transform is just the identity.

class merlion.transform.base.Identity

Bases: InvertibleTransformBase

The identity transformation. Does nothing.

property requires_inversion_state: False because the identity operation is stateless to invert.

property identity_inversion: Indicates whether the inverse applied by this transform is just the identity.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

merlion.transform.bound module

Transforms that clip the input.

class merlion.transform.bound.LowerUpperClip(lower=None, upper=None)

Bases: TransformBase

Clips the values of a time series to lie between lower and upper.

property requires_inversion_state: False because “inverting” value clipping is stateless.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

merlion.transform.factory module

Contains the TransformFactory for instantiating transforms.

class merlion.transform.factory.TransformFactory

Bases: object

classmethod get_transform_class(name)

Return type: Type[TransformBase]

classmethod create(name, **kwargs)

Return type: TransformBase

merlion.transform.moving_average module

Transforms that compute moving averages and k-step differences.

class merlion.transform.moving_average.MovingAverage(n_steps=None, weights=None, pad=False)

Bases: TransformBase

Computes the n_steps-step moving average of the time series, with the given relative weights assigned to each time in the moving average (default is to take the non-weighted average). Zero-pads the input time series to the left before taking the moving average.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.moving_average.MovingPercentile(n_steps, q)

Bases: TransformBase

Computes the n-step moving percentile of the time series. For datapoints at the start of the time series which are preceded by fewer than n_steps datapoints, the percentile is computed using only the available datapoints.

Parameters

q (float) – The percentile to use. Between 0 and 100 inclusive.
n_steps (int) – The number of steps to use.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.moving_average.ExponentialMovingAverage(alpha, normalize=True, p=0.95, ci=False)

Bases: InvertibleTransformBase

Computes the exponential moving average (normalized or un-normalized) of the time series, with smoothing factor alpha (lower alpha = more smoothing). alpha must be between 0 and 1.

The unnormalized moving average y of x is computed as

\[\begin{split}\begin{align*} y_0 & = x_0 \\ y_i & = (1 - \alpha) \cdot y_{i-1} + \alpha \cdot x_i \end{align*}\end{split}\]

The normalized moving average y of x is computed as

\[y_i = \frac{x_i + (1 - \alpha) x_{i-1} + \ldots + (1 - \alpha)^i x_0} {1 + (1 - \alpha) + \ldots + (1 - \alpha)^i}\]

Upper and lower confidence bounds, l and u, of the exponential moving average are computed using the exponential moving standard deviation, s, and y as

\[\begin{split}l_i = y_i + z_{\frac{1}{2} (1-p)} \times s_i \\ u_i = u_o + z_{\frac{1}{2} (1+p)} \times s_i\end{split}\]

If condfidence bounds are included, the returned time series will contain the upper and lower bounds as additional univariates. For example if the transform is applied to a time series with two univariates “x” and “y”, the resulting time series will contain univariates with the following names: “x”, “x_lb”, “x_ub”, “y”, “y_lb”, “y_ub”.

Parameters

alpha (float) – smoothing factor to use for exponential weighting.
normalize (bool) – If True, divide by the decaying adjustment in beginning periods.
p (float) – confidence level to use if returning the upper and lower bounds of the confidence interval.
ci (bool) – If True, return the the upper and lower confidence bounds of the the exponential moving average as well.

property requires_inversion_state: False because the exponential moving average is stateless to invert.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.moving_average.DifferenceTransform

Bases: InvertibleTransformBase

Applies a difference transform to the input time series. We include it as a moving average because we can consider the difference transform to be a 2-step moving “average” with weights w = [-1, 1].

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.moving_average.LagTransform(k, pad=False)

Bases: InvertibleTransformBase

Applies a lag transform to the input time series. Each x(i) gets mapped to x(i) - x(i-k). We include it as a moving average because we can consider the lag transform to be a k+1-step moving “average” with weights w = [-1, 0,…, 0, 1]. One may optionally left-pad the sequence with the first value in the time series.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

compute_lag(var)

Return type: UnivariateTimeSeries

merlion.transform.normalize module

Transforms that rescale the input or otherwise normalize it.

class merlion.transform.normalize.AbsVal

Bases: TransformBase

Takes the absolute value of the input time series.

property requires_inversion_state: False because the “pseudo-inverse” is just the identity (i.e. we lose sign information).

property identity_inversion: Indicates whether the inverse applied by this transform is just the identity.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.normalize.Rescale(bias=0.0, scale=1.0, normalize_bias=True, normalize_scale=True)

Bases: InvertibleTransformBase

Rescales the bias & scale of input vectors or scalars by pre-specified amounts.

property requires_inversion_state: False because rescaling operations are stateless to invert.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

property is_trained

class merlion.transform.normalize.MeanVarNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)

Bases: Rescale

A learnable transform that rescales the values of a time series to have zero mean and unit variance.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.normalize.MinMaxNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)

Bases: Rescale

A learnable transform that rescales the values of a time series to be between zero and one.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.normalize.BoxCoxTransform(lmbda=None, offset=0.0)

Bases: InvertibleTransformBase

Applies the Box-Cox power transform to the time series, with power lmbda. When lmbda is None, we When lmbda > 0, it is ((x + offset) ** lmbda - 1) / lmbda. When lmbda == 0, it is ln(lmbda + offset).

property requires_inversion_state: False because the Box-Cox transform does is stateless to invert.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

merlion.transform.resample module

Transforms that resample the input in time, or stack adjacent observations into vectors.

class merlion.transform.resample.TemporalResample(granularity=None, origin=None, trainable_granularity=None, remove_non_overlapping=True, aggregation_policy='Mean', missing_value_policy='Interpolate')

Bases: TransformBase

Defines a policy to temporally resample a time series at a specified granularity. Note that while this transform does support inversion, the recovered time series may differ from the input due to information loss when resampling.

Defines a policy to temporally resample a time series.

Parameters

granularity (Union[str, int, float, None]) – The granularity at which we want to resample.
origin (Optional[int]) – The time stamp defining the offset to start at.
trainable_granularity (Optional[bool]) – Whether the granularity is trainable, i.e. train() will set it to the GCD timedelta of a time series. If None (default), it will be trainable only if no granularity is explicitly given.
remove_non_overlapping – If True, we will only keep the portions of the univariates that overlap with each other. For example, if we have 3 univariates which span timestamps [0, 3600], [60, 3660], and [30, 3540], we will only keep timestamps in the range [60, 3540]. If False, we will keep all timestamps produced by the resampling.
aggregation_policy (Union[str, AggregationPolicy]) – The policy we will use to aggregate multiple values in a window (downsampling).
missing_value_policy (Union[str, MissingValuePolicy]) – The policy we will use to impute missing values (upsampling).

property requires_inversion_state: Indicates whether any state self.inversion_state is required to invert the transform. Specific to each transform. True by default.

property proper_inversion: We treat resampling as a proper inversion to avoid emitting warnings.

property granularity

property aggregation_policy: AggregationPolicy

Return type: AggregationPolicy

property missing_value_policy: MissingValuePolicy

Return type: MissingValuePolicy

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

class merlion.transform.resample.Shingle(size=1, stride=1, multivar_skip=True)

Bases: InvertibleTransformBase

Stacks adjacent observations into a single vector. Downsamples by the specified stride (less than or equal to the shingle size) if desired.

More concretely, consider an input time series,

TimeSeries(
    UnivariateTimeSeries((t1[0], x1[0]), ..., (t1[m], t1[m])),
    UnivariateTimeSeries((t2[0], x2[0]), ..., (t2[m], t2[m])),
)

Applying a shingle of size 3 and stride 2 will yield

TimeSeries(
    UnivariateTimeSeries((t1[0], x1[0]), (t1[2], x1[2]), ..., (t1[m-2], x1[m-2])),
    UnivariateTimeSeries((t1[1], x1[1]), (t1[3], x1[3]), ..., (t1[m-1], x1[m-1])),
    UnivariateTimeSeries((t1[2], x1[2]), (t1[4], x1[4]), ..., (t1[m],   x1[m])),

    UnivariateTimeSeries((t2[0], x2[0]), (t2[2], x2[2]), ..., (t2[m-2], x2[m-2])),
    UnivariateTimeSeries((t2[1], x2[1]), (t2[3], x2[3]), ..., (t2[m-1], x2[m-1])),
    UnivariateTimeSeries((t2[2], x2[2]), (t2[4], x2[4]), ..., (t2[m],   x2[m])),
)

If the length of any univariate is not perfectly divisible by the stride, we will pad it on the left side with the first value in the univariate.

Converts the time series into shingle vectors of the appropriate size. This converts each univariate into a multivariate time series with size variables.

Parameters

size (int) – let x(t) = value_t be the value of the time series at time index t. Then, the output vector for time index t will be [x(t - size + 1), ..., x(t - 1), x(t)].
stride (int) – The stride at which the output vectors are downsampled.
multivar_skip – Whether to skip this transform if the transform is already multivariate.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

merlion.transform.sequence module

Classes to compose (TransformSequence) or stack (TransformStack) multiple transforms.

class merlion.transform.sequence.TransformSequence(transforms)

Bases: InvertibleTransformBase

Applies a series of data transformations sequentially.

property proper_inversion: A transform sequence is invertible if and only if all the transforms comprising it are invertible.

property identity_inversion: Indicates whether the inverse applied by this transform is just the identity.

property requires_inversion_state: False because inversion state is held by individual transforms.

to_dict()

append(transform)

classmethod from_dict(state)

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

invert(time_series, retain_inversion_state=False)

Applies the inverse of this transform on the time series.

Parameters

time_series (TimeSeries) – The time series on which to apply the inverse transform.
retain_inversion_state – If an inversion state is required, supply retain_inversion_state=True to retain the inversion state even after calling this method. Otherwise, the inversion state will be set to None after the inversion is applied, to prevent a user error of accidentally using a stale state.

Return type

TimeSeries

Returns

The (inverse) transformed time series.

class merlion.transform.sequence.TransformStack(transforms, *, check_aligned=True)

Bases: InvertibleTransformBase

Applies a set of data transformations individually to an input time series. Stacks all of the results into a multivariate time series.

property proper_inversion: A stacked transform is invertible if and only if at least one of the transforms comprising it are invertible.

property requires_inversion_state: True because the inversion state tells us which stacked transform to invert, and which part of the output time series to apply that inverse to.

train(time_series): Sets all trainable parameters of the transform (if any), using the input time series as training data.

invert(time_series, retain_inversion_state=False)

Applies the inverse of this transform on the time series.

Parameters

time_series (TimeSeries) – The time series on which to apply the inverse transform.
retain_inversion_state – If an inversion state is required, supply retain_inversion_state=True to retain the inversion state even after calling this method. Otherwise, the inversion state will be set to None after the inversion is applied, to prevent a user error of accidentally using a stale state.

Return type

TimeSeries

Returns

The (inverse) transformed time series.

merlion.transform.anomalize module

Transforms that inject synthetic anomalies into time series.

class merlion.transform.anomalize.Anomalize(anom_prob=0.01, natural_bounds=(None, None), **kwargs)

Bases: TransformBase

Injects anomalies into a time series with controlled randomness and returns both the anomalized time series along with associated anomaly labels.

Parameters

anom_prob (float) – The probability of anomalizing a particular data point.
natural_bounds (Tuple[float, float]) – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within.

property natural_bounds

property is_trained: bool

Return type: bool

random_is_anom()

class merlion.transform.anomalize.Shock(alpha=0.2, pos_prob=1.0, sd_range=(3, 6), anom_width_range=(1, 5), persist_shock=False, **kwargs)

Bases: Anomalize

Injects random spikes or dips into a time series.

Letting y_t be a time series, if an anomaly is injected into the time series at time t, the anomalous value that gets injected is as follows:

\[\begin{split}\tilde{y}_t &= y_t + \text{shock} \\ \begin{split} \text{where } \space & \text{shock} = Sign \times Z\times \text{RWSD}_{\alpha}(y_t), \\ & Z \sim \mathrm{Unif}(a,b), \\ & Sign \text{ is a random sign} \\ \end{split}\end{split}\]

Additionally, the shock that is added to y_t is also applied to y_t+1, … y_w-1, where w, known as the “anomaly width” is randomly determined by a random draw from a uniform distribution.

Parameters

alpha (float) – The recency weight to use when calculating recency-weighted standard deviation.
pos_prob (float) – The probably with which a shock’s sign is positive.
sd_range (Tuple[float, float]) – The range of standard units that is used to create a shock
anom_width_range (Tuple[int, int]) – The range of anomaly widths.
persist_shock (bool) – whether to apply the shock to all successive datapoints.

property anom_width_range

property sd_range

random_sd_units()

random_anom_width()

random_is_anom()

train(time_series): The Shock transform doesn’t require training.

class merlion.transform.anomalize.LevelShift(**kwargs)

Bases: Shock

Injects random level shift anomalies into a time series.

A level shift is a sudden change of level in a time series. It is equivalent to a shock that, when applied to y_t, is also applied to every datapoint after t.

Parameters

alpha – The recency weight to use when calculating recency-weighted standard deviation.
pos_prob – The probably with which a shock’s sign is positive.
sd_range – The range of standard units that is used to create a shock
anom_width_range – The range of anomaly widths.
persist_shock – whether to apply the shock to all successive datapoints.

class merlion.transform.anomalize.TrendChange(alpha=0.5, beta=0.95, pos_prob=0.5, scale_range=(0.5, 3.0), **kwargs)

Bases: Anomalize

Injects random trend changes into a time series.

At a high level, the transform tracks the velocity (trend) of a time series and then, when injecting a trend change at a particular time, it scales the current velocity by a random factor. The disturbance to the velocity is persisted to values in the near future, thus emulating a sudden change of trend.

Let, (a,b) be the scale range. If the first trend change happens at time t*, it is injected as follows:

\[\begin{split}\tilde{y}_{t^*} = y_{t^*-1} + v_{t^*} + \Delta v_{t^*} \\ \begin{align*} \text{where } & \Delta v_{t^*} = Sign \times Z \times v_{t^*}, \\ & v_{t^*} = y_{t^*} - y_{t^*-1} & Z \sim Unif(a,b), \\ & Sign \text{ is a random sign} \\ \end{align*}\end{split}\]

Afterward, the trend change is persisted and y_t (for t > t*) is changed as follows:

\[\tilde{y}_{t} = \tilde{y}_{t-1} + v_t + \beta \times \Delta v_{t^*}\]

Parameters

anom_prob – The probability of anomalizing a particular data point.
natural_bounds – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within.

property scale_range

random_scale()

train(time_series): The TrendChange transform doesn’t require training.