Merlion Architecture
This document is intended for Merlion developers. It outlines the architecture of Merlion’s key components,
and how they interact with each other. In general, everything in this document describes the base.py
files
of the modules being discussed.
Transforms
Transforms in Merlion apply various useful pre-processing to time series data.
Training
Many transforms are trainable. For example, if we want to normalize the data to have zero mean and unit variance, we use training data to learn the mean and variance of each variable in the time series. If we wish to resample the data to a fixed granularity, we use the most commonly observed timedelta in the training data.
Inversion
Many transforms are invertible.
For example, one may invert the normalization y = (x - mu) / sigma
via x = sigma * y + mu
.
However, other transforms are lossy, and the input cannot be recovered without a state. For example, consider the
difference transform y[i+1] = x[i+1] - x[i]
. We need to record x[0]
as the transform.inversion_state
in order invert the difference transform and recover x
from y
.
For invertible transforms which require an inversion state, we handle the inversion state as follows:
When the transform is called, the inversion state is set. For example, if
diff = DifferenceTransform()
,y = diff(x)
will record the first observation of each univariate inx
as its inversion state.When
transform.invert(y)
is called, the inversion state is reset toNone
, unless the user explicitly invokestransform.invert(y, retain_inversion_state=True)
. This ensures that the user doesn’t inadvertently apply a stale inversion state to a new time series.
Some transforms are not invertible at all (e.g. resampling). In these case, transform.invert(y)
simply returns
y
, and a warning is emitted.
Multivariate Time Series
For the time being, all transforms are applied identically to all univariates in a time series. We generally track the variables required for each univariate via a dictionary that maps the name of the univariate to the variables relevant for it. We explicitly use the names of each univariate to ensure robustness to ensure that everything behaves as expected even if the individual variables are reordered.
A notable limitation of the current implementation is the fact that we cannot currently apply different transforms to
different univariates. For example, we cannot mean-variance normalize univariate 0 and apply a difference transform
to univariate 1. If there is demand for this sort of behavior in the future, we may consider adding a parameter to
each transform which indicates the names of the univariates it should be applied to. This may be combined with a
TransformStack
to apply different transforms to different
univariates. A new tutorial should be written if this feature is added.
Models
Models are the central object in Merlion.
Pre-Processing
Each model
has a model.transform
which pre-processes the data. Automatically applying this transform at both
training and inference time (and inverting the transform for forecasting) is a key feature of Merlion models. In
reality, it is worth noting that model.transform
is generally a reference to model.config.transform
.
If your data is already pre-processed, then you can set model.transform
to be the
Identity
.
When model.train()
is called, the first step is to call model.train_pre_process()
. This method
Records the dimension of the training data as
model.dim
Trains
model.transform
and applies it to the training dataRecords the sampling frequency of the transformed training data as
model.timedelta
(as well as the offsetmodel.timedelta_offset
)For forecasters, we additionally train and apply
model.exog_transform
on the exogenous data if any are given. We also record the dimension of the exogenous data asmodel.exog_dim
.
For anomaly detection, model.get_anomaly_score(time_series, time_series_prev)
includes the following pre-processing steps:
Apply
model.transform
to the concatenation oftime_series_prev
andtime_series
.Ensure that the data’s dimension matches the dimension of the training data.
For forecasting, model.forecast(time_stamps, time_series_prev, exog_data)
includes the following pre-processing steps:
If the model expects time series to be sampled at a fixed frequency, resample
time_stamps
to the frequency specified bymodel.timedelta
andmodel.timedelta_offset
.Save the current inversion state of
model.transform
, and then applymodel.transform
totime_series_prev
.If
exog_data
is given, applymodel.exog_transform
toexog_data
, and resampleexog_data
to the same time stamps astime_series_prev
(after the transform) andtime_stamps
.Ensure that the dimensions of
time_series_prev
andexog_data
match the training data. See Forecasting With Exogenous Regressors for more details on exogenous regressors.
User-Defined Implementations
After pre-processing the input data, we pass it to the user-defined implementations model._train()
,
model._train_with_exog()
, model._get_anomaly_score()
, or model._forecast()
. These methods do the real work
of training or inference for the underlying model, and these are the methods that must be manually defined for each new model.
Post-Processing
After training, both anomaly detectors and forecasters apply model.train_post_process()
on the output of
model._train()
. For anomaly detectors, this involves training their post-rule (calibrator and threshold) and then
returning the anomaly scores returned by model._train()
. For forecasters, this involves applying the inverse of
model.transform
on the forecast returned by model._train()
.
For anomaly detectors, the final step of calling model.get_anomaly_label()
is to apply the post-rule on the
unprocessed anomaly scores. For forecasters, we apply the inverse transform on the forecast and then set the inversion
state of model.transform
to be what it was before model.forecast()
was called.
Multiple Time Series
If we extend Merlion to accommodate training models on multiple time series, we must make some changes to the way that models handle transforms. In particular,
model.transform
should be re-trained for each time series individually.At training time, we will probably need to write a new method
model.train_pre_process_multiple()
which uses a different copy ofmodel.transform
for each time series. The other functionality should be similar tomodel.train_pre_process()
.At inference time,
time_series_prev
must be a required parameter, and a copy ofmodel.transform
should be trained ontime_series_prev
.
To make training code easier to write,
model.train_multiple()
probably doesn’t need to return anything when trained on multiple time series. This also removes the need to invert the transform on the training data.For anomaly detection, the post-processing transforms should be updated to accommodate multiple time series. This is especially important for calibration. For example, if we receive 10 time series of of anomaly scores, we should use all 10 to learn a single calibrator, rather than learning one calibrator per time series. The underlying assumption is that the anomaly score distributions should be similar across all time series.
For forecasting,
model.transform
can be trained and applied ontime_series_prev
, and then inverted on the concatenation oftime_series_prev
andforecast
as it is done now, via a call tomodel._process_forecast()
.model.exog_transform
should also be handled similarly (minus the inversion). See Forecasting With Exogenous Regressors for more details on exogenous regressors.
In general, the code changes to model.forecast()
and model.get_anomaly_score()
are relatively minor.
If the flag model.multi_series `` is set, then make sure that ``time_series_prev
is given and then train
model.transform
and model.exog_transform
on time_series_prev
and exog_data
respectively. After this
point, the functions should be unchanged.
Model Variants
There are a number of model variants which either build upon the above model classes or modify them slightly.
Simple Variants
Below are some simpler model variants that are useful to understand:
In order to support forecasting with exogenous regressors, we implement the
ForecasterExogBase
base class. Most of the functionality to support exogenous regressors is actually implemented inForecasterBase
, which this class inherits from. The only real difference is that a few internal fields have been changed to indicate that exogenous regressors are supported.We support using basic forecasters as the basis for anomaly detection models. The key piece is the mixin class
ForecastingDetectorBase
.Some models don’t work unless the input is pre-normalized. To support these models, we implement the
NormalizingConfig
. This config class applies aMeanVarNormalize
after any other pre-processing (specified by the user intransform
) has been applied. The full transform is accessed viaconfig.full_transform
. Models automatically understand how this works because the propertymodel.transform
tries to getmodel.config.full_transform
if possible and defaults tomodel.config.transform
otherwise. When using this class to implement models, simply add theNormalizingConfig
as a base class for your model.
Ensembles
Merlion supports ensembles of both anomaly detectors and forecasters. The ensemble config has two key components
which make this possible: ensemble.config.models
contains all the models present in the ensemble, while
ensemble.config.combiner
contains a combiner
object which defines
a way of combining the outputs of multiple models. This includes Mean, Median, and ModelSelector based on an evaluation
metric. When doing model selection, the ensemble.train()
method automatically splits the train data into training
and validation splits, and it evaluates the performance of each model on the validation split.
It then re-trains each model on the full training data afterwards.
One possible improvement is to parallelize the training of each models in the models. We can probably just use
Python’s native multiprocessing
library.
Layered Models
Layered models are a useful abstraction for models that act as a wrapper around another model. This feature is
especially useful for AutoML. Like ensembles, we store the wrapped model in layered_model.config.model
,
and layered_model.model
is a reference to layered_model.config.model
. The base model is the model at the
lowest level of the hierarchy.
There are a number of dirty tricks used to (1) ensure that layered anomaly detectors and forecasters inherit from the
right base classes, (2) config parameters are not duplicated between different levels of the hierarchy, and (3) users
can call a parameter like layered_model.config.max_forecast_steps
(which should only be defined for the base model)
and receive layered_model.base_model.config.max_forecast_steps
directly.
The documentation for merlion.models.layers
has some more details.
Post-Processing
Distinct post-rules are only relevant for anomaly detection.
There are two types of post-rules: calibration and thresholding. Similar to transforms, post-rules may be trained by
calling post_rule.train(train_anom_scores)
and applied by calling post_rule(anom_scores)
. Extending post-rules
so that they can be trained on multiple time series simultaneously is a worthwhile direction to investigate.
Other Modules
Most other modules are stand-alone pieces that don’t directly interact with each other, except in longer pipelines. We defer to the main documentation in merlion: Time Series Intelligence.