merlion.transform package
This package provides a number of useful data pre-processing transforms. Each
transform is a callable object that inherits either from TransformBase
or
InvertibleTransformBase
.
We will introduce the key features of transform objects using the Rescale
class. You may initialize a transform
in three ways:
from merlion.transform.factory import TransformFactory
from merlion.transform.normalize import Rescale
# Use the initializer
transform = Rescale(bias=5.0, scale=3.2)
# Use the class's from_dict() method with the arguments you would normally
# give to the initializer
kwargs = dict(bias=5.0, scale=3.2)
transform = Rescale.from_dict(kwargs)
# Use the TransformFactory with the class's name, and the keyword arguments
# you would normally give to the inializer
transform = TransformFactory.create("Rescale", **kwargs)
After initializing a transform
, one may use it as follows:
transform.train(time_series) # set any trainable params
transformed = transform(time_series) # apply the transform to the time series
inverted = transform.invert(transformed) # invert the transform
state_dict = transform.to_dict() # serialize to a JSON-compatible dict
Note that transform.invert()
is supported even if the transform doesn’t
inherit from InvertibleTransformBase
! In this case, transform.invert()
implements a pseudo-inverse that may not recover the original time_series
exactly. Additionally, the dict returned by transform.to_dict()
is exactly
the same as the dict expected by the class method TransformCls.from_dict()
.
Base primitives:
Contains the |
|
Transform base classes and the |
|
Classes to compose ( |
Resampling:
Transforms that resample the input in time, or stack adjacent observations into vectors. |
|
Transforms that compute moving averages and k-step differences. |
Normalization:
Transforms that clip the input. |
|
Transforms that rescale the input or otherwise normalize it. |
Miscellaneous:
Transforms that inject synthetic anomalies into time series. |
Base primitives
transform.factory
Contains the TransformFactory for instantiating transforms.
- class merlion.transform.factory.TransformFactory
Bases:
object
- classmethod get_transform_class(name)
- Return type
Type
[TransformBase
]
- classmethod create(name, **kwargs)
- Return type
transform.base
Transform base classes and the Identity transform.
- class merlion.transform.base.TransformBase
Bases:
object
Abstract class for a callable data pre-processing transform.
Subclasses must override the
train
method (pass
if no training is required) and__call__
method (to implement the actual transform).Subclasses may also support a pseudo inverse transform (possibly using the implementation-specific
self.inversion_state
, which should be set in__call__
). If an inversion state is not required, override the property requires_inversion_state to returnFalse
.Due to possible information loss in the forward pass, the inverse transform may be not be perfect/proper, and calling TransformBase.invert will result in a warning. By default, the inverse transform (implemented in TransformBase._invert) is just the identity.
- Variables
inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if TransformBase.requires_inversion_state is
True
. The inversion state is destroyed upon calling TransformBase.invert, unless the option the optionretain_inversion_state=True
is specified. This is to prevent potential user error.
- _invert(time_series)
Helper method which actually performs the inverse transform (when possible).
- Parameters
time_series (
TimeSeries
) – Time series to apply the inverse transform to- Return type
- Returns
The (inverse) transformed time series.
- property proper_inversion
TransformBase objects do not support a proper inversion.
- property requires_inversion_state
Indicates whether any state
self.inversion_state
is required to invert the transform. Specific to each transform.True
by default.
- property identity_inversion
Indicates whether the inverse applied by this transform is just the identity.
- to_dict()
- classmethod from_dict(state)
- abstract train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- invert(time_series, retain_inversion_state=False)
Applies the inverse of this transform on the time series.
- Parameters
time_series (
TimeSeries
) – The time series on which to apply the inverse transform.retain_inversion_state – If an inversion state is required, supply
retain_inversion_state=True
to retain the inversion state even after calling this method. Otherwise, the inversion state will be set toNone
after the inversion is applied, to prevent a user error of accidentally using a stale state.
- Return type
- Returns
The (inverse) transformed time series.
- class merlion.transform.base.InvertibleTransformBase
Bases:
TransformBase
Abstract class for a callable data pre-processing transform with a proper inverse.
In addition to overriding the
train
and__call__
methods, subclasses must also override the InvertibleTransformBase._invert method to implement the actual inverse transform.- Variables
inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if TransformBase.requires_inversion_state is
True
. The inversion state is destroyed upon calling TransformBase.invert, unless the option the optionretain_inversion_state=True
is specified. This is to prevent potential user error.
- abstract _invert(time_series)
Helper method which actually performs the inverse transform (when possible).
- Parameters
time_series (
TimeSeries
) – Time series to apply the inverse transform to- Return type
- Returns
The (inverse) transformed time series.
- property proper_inversion
InvertibleTransformBase always supports a proper inversion.
- property identity_inversion
Indicates whether the inverse applied by this transform is just the identity.
- class merlion.transform.base.Identity
Bases:
InvertibleTransformBase
The identity transformation. Does nothing.
- property requires_inversion_state
False
because the identity operation is stateless to invert.
- property identity_inversion
Indicates whether the inverse applied by this transform is just the identity.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
transform.sequence
Classes to compose (TransformSequence) or stack (TransformStack) multiple transforms.
- class merlion.transform.sequence.TransformSequence(transforms)
Bases:
InvertibleTransformBase
Applies a series of data transformations sequentially.
- property proper_inversion
A transform sequence is invertible if and only if all the transforms comprising it are invertible.
- property identity_inversion
Indicates whether the inverse applied by this transform is just the identity.
- property requires_inversion_state
False
because inversion state is held by individual transforms.
- to_dict()
- append(transform)
- classmethod from_dict(state)
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- invert(time_series, retain_inversion_state=False)
Applies the inverse of this transform on the time series.
- Parameters
time_series (
TimeSeries
) – The time series on which to apply the inverse transform.retain_inversion_state – If an inversion state is required, supply
retain_inversion_state=True
to retain the inversion state even after calling this method. Otherwise, the inversion state will be set toNone
after the inversion is applied, to prevent a user error of accidentally using a stale state.
- Return type
- Returns
The (inverse) transformed time series.
- class merlion.transform.sequence.TransformStack(transforms, *, check_aligned=True)
Bases:
InvertibleTransformBase
Applies a set of data transformations individually to an input time series. Stacks all of the results into a multivariate time series.
- property proper_inversion
A stacked transform is invertible if and only if at least one of the transforms comprising it are invertible.
- property requires_inversion_state
True
because the inversion state tells us which stacked transform to invert, and which part of the output time series to apply that inverse to.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- invert(time_series, retain_inversion_state=False)
Applies the inverse of this transform on the time series.
- Parameters
time_series (
TimeSeries
) – The time series on which to apply the inverse transform.retain_inversion_state – If an inversion state is required, supply
retain_inversion_state=True
to retain the inversion state even after calling this method. Otherwise, the inversion state will be set toNone
after the inversion is applied, to prevent a user error of accidentally using a stale state.
- Return type
- Returns
The (inverse) transformed time series.
Resampling
transform.resample
Transforms that resample the input in time, or stack adjacent observations into vectors.
- class merlion.transform.resample.TemporalResample(granularity=None, origin=None, trainable_granularity=None, remove_non_overlapping=True, aggregation_policy='Mean', missing_value_policy='Interpolate')
Bases:
TransformBase
Defines a policy to temporally resample a time series at a specified granularity. Note that while this transform does support inversion, the recovered time series may differ from the input due to information loss when resampling.
Defines a policy to temporally resample a time series.
- Parameters
granularity (
Union
[str
,int
,float
,None
]) – The granularity at which we want to resample.origin (
Optional
[int
]) – The time stamp defining the offset to start at.trainable_granularity (
Optional
[bool
]) – Whether we will automatically infer the granularity of the time series. IfNone
(default), it will be trainable only if no granularity is explicitly given.remove_non_overlapping – If
True
, we will only keep the portions of the univariates that overlap with each other. For example, if we have 3 univariates which span timestamps [0, 3600], [60, 3660], and [30, 3540], we will only keep timestamps in the range [60, 3540]. IfFalse
, we will keep all timestamps produced by the resampling.aggregation_policy (
Union
[str
,AggregationPolicy
]) – The policy we will use to aggregate multiple values in a window (downsampling).missing_value_policy (
Union
[str
,MissingValuePolicy
]) – The policy we will use to impute missing values (upsampling).
- property requires_inversion_state
Indicates whether any state
self.inversion_state
is required to invert the transform. Specific to each transform.True
by default.
- property proper_inversion
We treat resampling as a proper inversion to avoid emitting warnings.
- property granularity
- property aggregation_policy: AggregationPolicy
- property missing_value_policy: MissingValuePolicy
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.resample.Shingle(size=1, stride=1, multivar_skip=True)
Bases:
InvertibleTransformBase
Stacks adjacent observations into a single vector. Downsamples by the specified stride (less than or equal to the shingle size) if desired.
More concretely, consider an input time series,
TimeSeries( UnivariateTimeSeries((t1[0], x1[0]), ..., (t1[m], t1[m])), UnivariateTimeSeries((t2[0], x2[0]), ..., (t2[m], t2[m])), )
Applying a shingle of size 3 and stride 2 will yield
TimeSeries( UnivariateTimeSeries((t1[0], x1[0]), (t1[2], x1[2]), ..., (t1[m-2], x1[m-2])), UnivariateTimeSeries((t1[1], x1[1]), (t1[3], x1[3]), ..., (t1[m-1], x1[m-1])), UnivariateTimeSeries((t1[2], x1[2]), (t1[4], x1[4]), ..., (t1[m], x1[m])), UnivariateTimeSeries((t2[0], x2[0]), (t2[2], x2[2]), ..., (t2[m-2], x2[m-2])), UnivariateTimeSeries((t2[1], x2[1]), (t2[3], x2[3]), ..., (t2[m-1], x2[m-1])), UnivariateTimeSeries((t2[2], x2[2]), (t2[4], x2[4]), ..., (t2[m], x2[m])), )
If the length of any univariate is not perfectly divisible by the stride, we will pad it on the left side with the first value in the univariate.
Converts the time series into shingle vectors of the appropriate size. This converts each univariate into a multivariate time series with
size
variables.- Parameters
size (
int
) – let x(t) = value_t be the value of the time series at time index t. Then, the output vector for time index t will be[x(t - size + 1), ..., x(t - 1), x(t)]
.stride (
int
) – The stride at which the output vectors are downsampled.multivar_skip – Whether to skip this transform if the transform is already multivariate.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
transform.moving_average
Transforms that compute moving averages and k-step differences.
- class merlion.transform.moving_average.MovingAverage(n_steps=None, weights=None)
Bases:
InvertibleTransformBase
Computes the n_steps-step moving average of the time series, with the given relative weights assigned to each time in the moving average (default is to take the non-weighted average). Zero-pads the input time series to the left before taking the moving average.
- property requires_inversion_state
Indicates whether any state
self.inversion_state
is required to invert the transform. Specific to each transform.True
by default.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.moving_average.MovingPercentile(n_steps, q)
Bases:
TransformBase
Computes the n-step moving percentile of the time series. For datapoints at the start of the time series which are preceded by fewer than
n_steps
datapoints, the percentile is computed using only the available datapoints.- Parameters
q (
float
) – The percentile to use. Between 0 and 100 inclusive.n_steps (
int
) – The number of steps to use.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.moving_average.ExponentialMovingAverage(alpha, normalize=True, p=0.95, ci=False)
Bases:
InvertibleTransformBase
Computes the exponential moving average (normalized or un-normalized) of the time series, with smoothing factor alpha (lower alpha = more smoothing). alpha must be between 0 and 1.
The unnormalized moving average
y
ofx
is computed as\[\begin{split}\begin{align*} y_0 & = x_0 \\ y_i & = (1 - \alpha) \cdot y_{i-1} + \alpha \cdot x_i \end{align*}\end{split}\]The normalized moving average
y
ofx
is computed as\[y_i = \frac{x_i + (1 - \alpha) x_{i-1} + \ldots + (1 - \alpha)^i x_0} {1 + (1 - \alpha) + \ldots + (1 - \alpha)^i}\]Upper and lower confidence bounds,
l
andu
, of the exponential moving average are computed using the exponential moving standard deviation,s
, andy
as\[\begin{split}l_i = y_i + z_{\frac{1}{2} (1-p)} \times s_i \\ u_i = u_o + z_{\frac{1}{2} (1+p)} \times s_i\end{split}\]If condfidence bounds are included, the returned time series will contain the upper and lower bounds as additional univariates. For example if the transform is applied to a time series with two univariates “x” and “y”, the resulting time series will contain univariates with the following names: “x”, “x_lb”, “x_ub”, “y”, “y_lb”, “y_ub”.
- Parameters
alpha (
float
) – smoothing factor to use for exponential weighting.normalize (
bool
) – If True, divide by the decaying adjustment in beginning periods.p (
float
) – confidence level to use if returning the upper and lower bounds of the confidence interval.ci (
bool
) – If True, return the the upper and lower confidence bounds of the the exponential moving average as well.
- property requires_inversion_state
False
because the exponential moving average is stateless to invert.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.moving_average.DifferenceTransform
Bases:
InvertibleTransformBase
Applies a difference transform to the input time series. We include it as a moving average because we can consider the difference transform to be a 2-step moving “average” with weights w = [-1, 1].
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.moving_average.LagTransform(k, pad=False)
Bases:
InvertibleTransformBase
Applies a lag transform to the input time series. Each x(i) gets mapped to x(i) - x(i-k). We include it as a moving average because we can consider the lag transform to be a k+1-step moving “average” with weights w = [-1, 0,…, 0, 1]. One may optionally left-pad the sequence with the first value in the time series.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- compute_lag(var)
- Return type
Normalization
transform.normalize
Transforms that rescale the input or otherwise normalize it.
- class merlion.transform.normalize.AbsVal
Bases:
TransformBase
Takes the absolute value of the input time series.
- property requires_inversion_state
False
because the “pseudo-inverse” is just the identity (i.e. we lose sign information).
- property identity_inversion
Indicates whether the inverse applied by this transform is just the identity.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.normalize.Rescale(bias=0.0, scale=1.0, normalize_bias=True, normalize_scale=True)
Bases:
InvertibleTransformBase
Rescales the bias & scale of input vectors or scalars by pre-specified amounts.
- property requires_inversion_state
False
because rescaling operations are stateless to invert.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- property is_trained
- class merlion.transform.normalize.MeanVarNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)
Bases:
Rescale
A learnable transform that rescales the values of a time series to have zero mean and unit variance.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.normalize.MinMaxNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)
Bases:
Rescale
A learnable transform that rescales the values of a time series to be between zero and one.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
- class merlion.transform.normalize.BoxCoxTransform(lmbda=None, offset=0.0)
Bases:
InvertibleTransformBase
Applies the Box-Cox power transform to the time series, with power lmbda. When lmbda is None, we When lmbda > 0, it is ((x + offset) ** lmbda - 1) / lmbda. When lmbda == 0, it is ln(lmbda + offset).
- property requires_inversion_state
False
because the Box-Cox transform does is stateless to invert.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
transform.bound
Transforms that clip the input.
- class merlion.transform.bound.LowerUpperClip(lower=None, upper=None)
Bases:
TransformBase
Clips the values of a time series to lie between lower and upper.
- property requires_inversion_state
False
because “inverting” value clipping is stateless.
- train(time_series)
Sets all trainable parameters of the transform (if any), using the input time series as training data.
Miscellaneous
transform.anomalize
Transforms that inject synthetic anomalies into time series.
- class merlion.transform.anomalize.Anomalize(anom_prob=0.01, natural_bounds=(None, None), **kwargs)
Bases:
TransformBase
Injects anomalies into a time series with controlled randomness and returns both the anomalized time series along with associated anomaly labels.
- Parameters
anom_prob (
float
) – The probability of anomalizing a particular data point.natural_bounds (
Tuple
[float
,float
]) – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within.
- property natural_bounds
- property is_trained: bool
- random_is_anom()
- class merlion.transform.anomalize.Shock(alpha=0.2, pos_prob=1.0, sd_range=(3, 6), anom_width_range=(1, 5), persist_shock=False, **kwargs)
Bases:
Anomalize
Injects random spikes or dips into a time series.
Letting
y_t
be a time series, if an anomaly is injected into the time series at timet
, the anomalous value that gets injected is as follows:\[\begin{split}\tilde{y}_t &= y_t + \text{shock} \\ \begin{split} \text{where } \space & \text{shock} = Sign \times Z\times \text{RWSD}_{\alpha}(y_t), \\ & Z \sim \mathrm{Unif}(a,b), \\ & Sign \text{ is a random sign} \\ \end{split}\end{split}\]Additionally, the
shock
that is added toy_t
is also applied toy_t+1
, …y_w-1
, wherew
, known as the “anomaly width” is randomly determined by a random draw from a uniform distribution.- Parameters
alpha (
float
) – The recency weight to use when calculating recency-weighted standard deviation.pos_prob (
float
) – The probably with which a shock’s sign is positive.sd_range (
Tuple
[float
,float
]) – The range of standard units that is used to create a shockanom_width_range (
Tuple
[int
,int
]) – The range of anomaly widths.persist_shock (
bool
) – whether to apply the shock to all successive datapoints.
- property anom_width_range
- property sd_range
- random_sd_units()
- random_anom_width()
- random_is_anom()
- train(time_series)
The Shock transform doesn’t require training.
- class merlion.transform.anomalize.LevelShift(**kwargs)
Bases:
Shock
Injects random level shift anomalies into a time series.
A level shift is a sudden change of level in a time series. It is equivalent to a shock that, when applied to
y_t
, is also applied to every datapoint aftert
.- Parameters
alpha – The recency weight to use when calculating recency-weighted standard deviation.
pos_prob – The probably with which a shock’s sign is positive.
sd_range – The range of standard units that is used to create a shock
anom_width_range – The range of anomaly widths.
persist_shock – whether to apply the shock to all successive datapoints.
- class merlion.transform.anomalize.TrendChange(alpha=0.5, beta=0.95, pos_prob=0.5, scale_range=(0.5, 3.0), **kwargs)
Bases:
Anomalize
Injects random trend changes into a time series.
At a high level, the transform tracks the velocity (trend) of a time series and then, when injecting a trend change at a particular time, it scales the current velocity by a random factor. The disturbance to the velocity is persisted to values in the near future, thus emulating a sudden change of trend.
Let,
(a,b)
be the scale range. If the first trend change happens at timet*
, it is injected as follows:\[\begin{split}\tilde{y}_{t^*} = y_{t^*-1} + v_{t^*} + \Delta v_{t^*} \\ \begin{align*} \text{where } & \Delta v_{t^*} = Sign \times Z \times v_{t^*}, \\ & v_{t^*} = y_{t^*} - y_{t^*-1} & Z \sim Unif(a,b), \\ & Sign \text{ is a random sign} \\ \end{align*}\end{split}\]Afterward, the trend change is persisted and
y_t
(fort > t*
) is changed as follows:\[\tilde{y}_{t} = \tilde{y}_{t-1} + v_t + \beta \times \Delta v_{t^*}\]- Parameters
anom_prob – The probability of anomalizing a particular data point.
natural_bounds – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within.
- property scale_range
- random_scale()
- train(time_series)
The TrendChange transform doesn’t require training.