merlion.transform package
This package provides a number of useful data pre-processing transforms. Each
transform is a callable object that inherits either from TransformBase or
InvertibleTransformBase.
We will introduce the key features of transform objects using the Rescale
class. You may initialize a transform in three ways:
from merlion.transform.factory import TransformFactory
from merlion.transform.normalize import Rescale
# Use the initializer
transform = Rescale(bias=5.0, scale=3.2)
# Use the class's from_dict() method with the arguments you would normally
# give to the initializer
kwargs = dict(bias=5.0, scale=3.2)
transform = Rescale.from_dict(kwargs)
# Use the TransformFactory with the class's name, and the keyword arguments
# you would normally give to the inializer
transform = TransformFactory.create("Rescale", **kwargs)
After initializing a transform, one may use it as follows:
transform.train(time_series)              # set any trainable params
transformed = transform(time_series)      # apply the transform to the time series
inverted = transform.invert(transformed)  # invert the transform
state_dict = transform.to_dict()          # serialize to a JSON-compatible dict
Note that transform.invert() is supported even if the transform doesn’t
inherit from InvertibleTransformBase! In this case, transform.invert()
implements a pseudo-inverse that may not recover the original time_series
exactly. Additionally, the dict returned by transform.to_dict() is exactly
the same as the dict expected by the class method TransformCls.from_dict().
| Contains the  | |
| Transform base classes and the  | |
| Transforms that clip the input. | |
| Transforms that compute moving averages and k-step differences. | |
| Transforms that rescale the input or otherwise normalize it. | |
| Transforms that resample the input in time, or stack adjacent observations into vectors. | |
| Classes to compose ( | |
| Transforms that inject synthetic anomalies into time series. | 
Submodules
merlion.transform.base module
Transform base classes and the Identity transform.
- class merlion.transform.base.TransformBase
- Bases: - object- Abstract class for a callable data pre-processing transform. - Subclasses must override the - trainmethod (- passif no training is required) and- __call__method (to implement the actual transform).- Subclasses may also support a pseudo inverse transform (possibly using the implementation-specific - self.inversion_state, which should be set in- __call__). If an inversion state is not required, override the property- requires_inversion_stateto return- False.- Due to possible information loss in the forward pass, the inverse transform may be not be perfect/proper, and calling - TransformBase.invertwill result in a warning. By default, the inverse transform (implemented in- TransformBase._invert) is just the identity.- Variables
- inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if - TransformBase.requires_inversion_stateis- True. The inversion state is destroyed upon calling- TransformBase.invert, unless the option the option- retain_inversion_state=Trueis specified. This is to prevent potential user error.
 - _invert(time_series)
- Helper method which actually performs the inverse transform (when possible). - Parameters
- time_series ( - TimeSeries) – Time series to apply the inverse transform to
- Return type
- Returns
- The (inverse) transformed time series. 
 
 - property proper_inversion
- TransformBaseobjects do not support a proper inversion.
 - property requires_inversion_state
- Indicates whether any state - self.inversion_stateis required to invert the transform. Specific to each transform.- Trueby default.
 - to_dict()
 - classmethod from_dict(state)
 - abstract train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 - invert(time_series, retain_inversion_state=False)
- Applies the inverse of this transform on the time series. - Parameters
- time_series ( - TimeSeries) – The time series on which to apply the inverse transform.
- retain_inversion_state – If an inversion state is required, supply - retain_inversion_state=Trueto retain the inversion state even after calling this method. Otherwise, the inversion state will be set to- Noneafter the inversion is applied, to prevent a user error of accidentally using a stale state.
 
- Return type
- Returns
- The (inverse) transformed time series. 
 
 
- class merlion.transform.base.InvertibleTransformBase
- Bases: - TransformBase- Abstract class for a callable data pre-processing transform with a proper inverse. - In addition to overriding the - trainand- __call__methods, subclasses must also override the- InvertibleTransformBase._invertmethod to implement the actual inverse transform.- Variables
- inversion_state – Implementation-specific intermediate state that is used to compute the inverse transform for a particular time series. Only used if - TransformBase.requires_inversion_stateis- True. The inversion state is destroyed upon calling- TransformBase.invert, unless the option the option- retain_inversion_state=Trueis specified. This is to prevent potential user error.
 - abstract _invert(time_series)
- Helper method which actually performs the inverse transform (when possible). - Parameters
- time_series ( - TimeSeries) – Time series to apply the inverse transform to
- Return type
- Returns
- The (inverse) transformed time series. 
 
 - property proper_inversion
- InvertibleTransformBasealways supports a proper inversion.
 
- class merlion.transform.base.Identity
- Bases: - InvertibleTransformBase- The identity transformation. Does nothing. - property requires_inversion_state
- Falsebecause the identity operation is stateless to invert.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
merlion.transform.bound module
Transforms that clip the input.
- class merlion.transform.bound.LowerUpperClip(lower=None, upper=None)
- Bases: - TransformBase- Clips the values of a time series to lie between lower and upper. - property requires_inversion_state
- Falsebecause “inverting” value clipping is stateless.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
merlion.transform.factory module
Contains the TransformFactory for instantiating transforms.
- class merlion.transform.factory.TransformFactory
- Bases: - object- classmethod get_transform_class(name)
- Return type
- Type[- TransformBase]
 
 - classmethod create(name, **kwargs)
- Return type
 
 
merlion.transform.moving_average module
Transforms that compute moving averages and k-step differences.
- class merlion.transform.moving_average.MovingAverage(n_steps=None, weights=None, pad=False)
- Bases: - TransformBase- Computes the n_steps-step moving average of the time series, with the given relative weights assigned to each time in the moving average (default is to take the non-weighted average). Zero-pads the input time series to the left before taking the moving average. - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.moving_average.MovingPercentile(n_steps, q)
- Bases: - TransformBase- Computes the n-step moving percentile of the time series. For datapoints at the start of the time series which are preceded by fewer than - n_stepsdatapoints, the percentile is computed using only the available datapoints.- Parameters
- q ( - float) – The percentile to use. Between 0 and 100 inclusive.
- n_steps ( - int) – The number of steps to use.
 
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.moving_average.ExponentialMovingAverage(alpha, normalize=True, p=0.95, ci=False)
- Bases: - InvertibleTransformBase- Computes the exponential moving average (normalized or un-normalized) of the time series, with smoothing factor alpha (lower alpha = more smoothing). alpha must be between 0 and 1. - The unnormalized moving average - yof- xis computed as\[\begin{split}\begin{align*} y_0 & = x_0 \\ y_i & = (1 - \alpha) \cdot y_{i-1} + \alpha \cdot x_i \end{align*}\end{split}\]- The normalized moving average - yof- xis computed as\[y_i = \frac{x_i + (1 - \alpha) x_{i-1} + \ldots + (1 - \alpha)^i x_0} {1 + (1 - \alpha) + \ldots + (1 - \alpha)^i}\]- Upper and lower confidence bounds, - land- u, of the exponential moving average are computed using the exponential moving standard deviation,- s, and- yas\[\begin{split}l_i = y_i + z_{\frac{1}{2} (1-p)} \times s_i \\ u_i = u_o + z_{\frac{1}{2} (1+p)} \times s_i\end{split}\]- If condfidence bounds are included, the returned time series will contain the upper and lower bounds as additional univariates. For example if the transform is applied to a time series with two univariates “x” and “y”, the resulting time series will contain univariates with the following names: “x”, “x_lb”, “x_ub”, “y”, “y_lb”, “y_ub”. - Parameters
- alpha ( - float) – smoothing factor to use for exponential weighting.
- normalize ( - bool) – If True, divide by the decaying adjustment in beginning periods.
- p ( - float) – confidence level to use if returning the upper and lower bounds of the confidence interval.
- ci ( - bool) – If True, return the the upper and lower confidence bounds of the the exponential moving average as well.
 
 - property requires_inversion_state
- Falsebecause the exponential moving average is stateless to invert.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.moving_average.DifferenceTransform
- Bases: - InvertibleTransformBase- Applies a difference transform to the input time series. We include it as a moving average because we can consider the difference transform to be a 2-step moving “average” with weights w = [-1, 1]. - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.moving_average.LagTransform(k, pad=False)
- Bases: - InvertibleTransformBase- Applies a lag transform to the input time series. Each x(i) gets mapped to x(i) - x(i-k). We include it as a moving average because we can consider the lag transform to be a k+1-step moving “average” with weights w = [-1, 0,…, 0, 1]. One may optionally left-pad the sequence with the first value in the time series. - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 - compute_lag(var)
- Return type
 
 
merlion.transform.normalize module
Transforms that rescale the input or otherwise normalize it.
- class merlion.transform.normalize.AbsVal
- Bases: - TransformBase- Takes the absolute value of the input time series. - property requires_inversion_state
- Falsebecause the “pseudo-inverse” is just the identity (i.e. we lose sign information).
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.normalize.Rescale(bias=0.0, scale=1.0, normalize_bias=True, normalize_scale=True)
- Bases: - InvertibleTransformBase- Rescales the bias & scale of input vectors or scalars by pre-specified amounts. - property requires_inversion_state
- Falsebecause rescaling operations are stateless to invert.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 - property is_trained
 
- class merlion.transform.normalize.MeanVarNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)
- Bases: - Rescale- A learnable transform that rescales the values of a time series to have zero mean and unit variance. - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.normalize.MinMaxNormalize(bias=None, scale=None, normalize_bias=True, normalize_scale=True)
- Bases: - Rescale- A learnable transform that rescales the values of a time series to be between zero and one. - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.normalize.PowerTransform(lmbda=0.0, offset=0.0)
- Bases: - InvertibleTransformBase- Applies the Box-Cox power transform to the time series, with power lmbda. When lmbda > 0, it is ((x + offset) ** lmbda - 1) / lmbda. When lmbda == 0, it is ln(lmbda + offset). - property requires_inversion_state
- Falsebecause the Box-Cox transform does is stateless to invert.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
merlion.transform.resample module
Transforms that resample the input in time, or stack adjacent observations into vectors.
- class merlion.transform.resample.TemporalResample(granularity=None, origin=None, trainable_granularity=None, remove_non_overlapping=True, aggregation_policy='Mean', missing_value_policy='Interpolate')
- Bases: - TransformBase- Defines a policy to temporally resample a time series at a specified granularity. Note that while this transform does support inversion, the recovered time series may differ from the input due to information loss when downsampling. - Defines a policy to temporally resample a time series. - Parameters
- granularity ( - Union[- str,- int,- float,- None]) – The granularity at which we want to resample.
- origin ( - Optional[- int]) – The time stamp defining the offset to start at.
- trainable_granularity ( - Optional[- bool]) – Whether the granularity is trainable, i.e. train() will set it to the GCD timedelta of a time series. If- None(default), it will be trainable only if no granularity is explicitly given.
- remove_non_overlapping – If - True, we will only keep the portions of the univariates that overlap with each other. For example, if we have 3 univariates which span timestamps [0, 3600], [60, 3660], and [30, 3540], we will only keep timestamps in the range [60, 3540]. If- False, we will keep all timestamps produced by the resampling.
- aggregation_policy ( - Union[- str,- AggregationPolicy]) – The policy we will use to aggregate multiple values in a window (downsampling).
- missing_value_policy ( - Union[- str,- MissingValuePolicy]) – The policy we will use to impute missing values (upsampling).
 
 - property requires_inversion_state
- Indicates whether any state - self.inversion_stateis required to invert the transform. Specific to each transform.- Trueby default.
 - property proper_inversion
- We treat resampling as a proper inversion to avoid emitting warnings. 
 - property granularity
 - property aggregation_policy: AggregationPolicy
- Return type
 
 - property missing_value_policy: MissingValuePolicy
- Return type
 
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
- class merlion.transform.resample.Shingle(size=1, stride=1, multivar_skip=True)
- Bases: - InvertibleTransformBase- Stacks adjacent observations into a single vector. Downsamples by the specified stride (less than or equal to the shingle size) if desired. - More concretely, consider an input time series, - TimeSeries( UnivariateTimeSeries((t1[0], x1[0]), ..., (t1[m], t1[m])), UnivariateTimeSeries((t2[0], x2[0]), ..., (t2[m], t2[m])), ) - Applying a shingle of size 3 and stride 2 will yield - TimeSeries( UnivariateTimeSeries((t1[0], x1[0]), (t1[2], x1[2]), ..., (t1[m-2], x1[m-2])), UnivariateTimeSeries((t1[1], x1[1]), (t1[3], x1[3]), ..., (t1[m-1], x1[m-1])), UnivariateTimeSeries((t1[2], x1[2]), (t1[4], x1[4]), ..., (t1[m], x1[m])), UnivariateTimeSeries((t2[0], x2[0]), (t2[2], x2[2]), ..., (t2[m-2], x2[m-2])), UnivariateTimeSeries((t2[1], x2[1]), (t2[3], x2[3]), ..., (t2[m-1], x2[m-1])), UnivariateTimeSeries((t2[2], x2[2]), (t2[4], x2[4]), ..., (t2[m], x2[m])), ) - If the length of any univariate is not perfectly divisible by the stride, we will pad it on the left side with the first value in the univariate. - Converts the time series into shingle vectors of the appropriate size. This converts each univariate into a multivariate time series with - sizevariables.- Parameters
- size ( - int) – let x(t) = value_t be the value of the time series at time index t. Then, the output vector for time index t will be- [x(t - size + 1), ..., x(t - 1), x(t)].
- stride ( - int) – The stride at which the output vectors are downsampled.
- multivar_skip – Whether to skip this transform if the transform is already multivariate. 
 
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 
merlion.transform.sequence module
Classes to compose (TransformSequence) or stack (TransformStack) multiple transforms.
- class merlion.transform.sequence.TransformSequence(transforms)
- Bases: - InvertibleTransformBase- Applies a series of data transformations sequentially. - property proper_inversion
- A transform sequence is invertible if and only if all the transforms comprising it are invertible. 
 - property requires_inversion_state
- Falsebecause inversion state is held by individual transforms.
 - to_dict()
 - append(transform)
 - classmethod from_dict(state)
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 - invert(time_series, retain_inversion_state=False)
- Applies the inverse of this transform on the time series. - Parameters
- time_series ( - TimeSeries) – The time series on which to apply the inverse transform.
- retain_inversion_state – If an inversion state is required, supply - retain_inversion_state=Trueto retain the inversion state even after calling this method. Otherwise, the inversion state will be set to- Noneafter the inversion is applied, to prevent a user error of accidentally using a stale state.
 
- Return type
- Returns
- The (inverse) transformed time series. 
 
 
- class merlion.transform.sequence.TransformStack(transforms, *, check_aligned=True)
- Bases: - TransformSequence- Applies a set of data transformations individually to an input time series. Stacks all of the results into a multivariate time series. - property proper_inversion
- A stacked transform is invertible if and only if at least one of the transforms comprising it are invertible. 
 - property requires_inversion_state
- Truebecause the inversion state tells us which stacked transform to invert, and which part of the output time series to apply that inverse to.
 - train(time_series)
- Sets all trainable parameters of the transform (if any), using the input time series as training data. 
 - invert(time_series, retain_inversion_state=False)
- Applies the inverse of this transform on the time series. - Parameters
- time_series ( - TimeSeries) – The time series on which to apply the inverse transform.
- retain_inversion_state – If an inversion state is required, supply - retain_inversion_state=Trueto retain the inversion state even after calling this method. Otherwise, the inversion state will be set to- Noneafter the inversion is applied, to prevent a user error of accidentally using a stale state.
 
- Return type
- Returns
- The (inverse) transformed time series. 
 
 
merlion.transform.anomalize module
Transforms that inject synthetic anomalies into time series.
- class merlion.transform.anomalize.Anomalize(anom_prob=0.01, natural_bounds=(None, None), **kwargs)
- Bases: - TransformBase- Injects anomalies into a time series with controlled randomness and returns both the anomalized time series along with associated anomaly labels. - Parameters
- anom_prob ( - float) – The probability of anomalizing a particular data point.
- natural_bounds ( - Tuple[- float,- float]) – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within.
 
 - property natural_bounds
 - property is_trained: bool
- Return type
- bool
 
 - random_is_anom()
 
- class merlion.transform.anomalize.Shock(alpha=0.2, pos_prob=1.0, sd_range=(3, 6), anom_width_range=(1, 5), persist_shock=False, **kwargs)
- Bases: - Anomalize- Injects random spikes or dips into a time series. - Letting - y_tbe a time series, if an anomaly is injected into the time series at time- t, the anomalous value that gets injected is as follows:\[\begin{split}\tilde{y}_t &= y_t + \text{shock} \\ \begin{split} \text{where } \space & \text{shock} = Sign \times Z\times \text{RWSD}_{\alpha}(y_t), \\ & Z \sim \mathrm{Unif}(a,b), \\ & Sign \text{ is a random sign} \\ \end{split}\end{split}\]- Additionally, the - shockthat is added to- y_tis also applied to- y_t+1, …- y_w-1, where- w, known as the “anomaly width” is randomly determined by a random draw from a uniform distribution.- Parameters
- alpha ( - float) – The recency weight to use when calculating recency-weighted standard deviation.
- pos_prob ( - float) – The probably with which a shock’s sign is positive.
- sd_range ( - Tuple[- float,- float]) – The range of standard units that is used to create a shock
- anom_width_range ( - Tuple[- int,- int]) – The range of anomaly widths.
- persist_shock ( - bool) – whether to apply the shock to all successive datapoints.
 
 - property anom_width_range
 - property sd_range
 - random_sd_units()
 - random_anom_width()
 - random_is_anom()
 
- class merlion.transform.anomalize.LevelShift(**kwargs)
- Bases: - Shock- Injects random level shift anomalies into a time series. - A level shift is a sudden change of level in a time series. It is equivalent to a shock that, when applied to - y_t, is also applied to every datapoint after- t.- Parameters
- alpha – The recency weight to use when calculating recency-weighted standard deviation. 
- pos_prob – The probably with which a shock’s sign is positive. 
- sd_range – The range of standard units that is used to create a shock 
- anom_width_range – The range of anomaly widths. 
- persist_shock – whether to apply the shock to all successive datapoints. 
 
 
- class merlion.transform.anomalize.TrendChange(alpha=0.5, beta=0.95, pos_prob=0.5, scale_range=(0.5, 3.0), **kwargs)
- Bases: - Anomalize- Injects random trend changes into a time series. - At a high level, the transform tracks the velocity (trend) of a time series and then, when injecting a trend change at a particular time, it scales the current velocity by a random factor. The disturbance to the velocity is persisted to values in the near future, thus emulating a sudden change of trend. - Let, - (a,b)be the scale range. If the first trend change happens at time- t*, it is injected as follows:\[\begin{split}\tilde{y}_{t^*} = y_{t^*-1} + v_{t^*} + \Delta v_{t^*} \\ \begin{align*} \text{where } & \Delta v_{t^*} = Sign \times Z \times v_{t^*}, \\ & v_{t^*} = y_{t^*} - y_{t^*-1} & Z \sim Unif(a,b), \\ & Sign \text{ is a random sign} \\ \end{align*}\end{split}\]- Afterward, the trend change is persisted and - y_t(for- t > t*) is changed as follows:\[\tilde{y}_{t} = \tilde{y}_{t-1} + v_t + \beta \times \Delta v_{t^*}\]- Parameters
- anom_prob – The probability of anomalizing a particular data point. 
- natural_bounds – Upper and lower natrual boundaries which injected anomalies should a particular time series must stay within. 
 
 - property scale_range
 - random_scale()
 - train(time_series)
- The - TrendChangetransform doesn’t require training.