Merlion Architecture
This document is intended for Merlion developers. It outlines the architecture of Merlion’s key components,
and how they interact with each other. In general, everything in this document describes the base.py files
of the modules being discussed.
Transforms
Transforms in Merlion apply various useful pre-processing to time series data.
Training
Many transforms are trainable. For example, if we want to normalize the data to have zero mean and unit variance, we use training data to learn the mean and variance of each variable in the time series. If we wish to resample the data to a fixed granularity, we use the most commonly observed timedelta in the training data.
Inversion
Many transforms are invertible.
For example, one may invert the normalization y = (x - mu) / sigma via x = sigma * y + mu.
However, other transforms are lossy, and the input cannot be recovered without a state. For example, consider the
difference transform y[i+1] = x[i+1] - x[i]. We need to record x[0] as the transform.inversion_state
in order invert the difference transform and recover x from y.
For invertible transforms which require an inversion state, we handle the inversion state as follows:
When the transform is called, the inversion state is set. For example, if
diff = DifferenceTransform(),y = diff(x)will record the first observation of each univariate inxas its inversion state.When
transform.invert(y)is called, the inversion state is reset toNone, unless the user explicitly invokestransform.invert(y, retain_inversion_state=True). This ensures that the user doesn’t inadvertently apply a stale inversion state to a new time series.
Some transforms are not invertible at all (e.g. resampling). In these case, transform.invert(y) simply returns
y, and a warning is emitted.
Multivariate Time Series
For the time being, all transforms are applied identically to all univariates in a time series. We generally track the variables required for each univariate via a dictionary that maps the name of the univariate to the variables relevant for it. We explicitly use the names of each univariate to ensure robustness to ensure that everything behaves as expected even if the individual variables are reordered.
A notable limitation of the current implementation is the fact that we cannot currently apply different transforms to
different univariates. For example, we cannot mean-variance normalize univariate 0 and apply a difference transform
to univariate 1. If there is demand for this sort of behavior in the future, we may consider adding a parameter to
each transform which indicates the names of the univariates it should be applied to. This may be combined with a
TransformStack to apply different transforms to different
univariates. A new tutorial should be written if this feature is added.
Models
Models are the central object in Merlion.
Pre-Processing
Each model has a model.transform which pre-processes the data. Automatically applying this transform at both
training and inference time (and inverting the transform for forecasting) is a key feature of Merlion models. In
reality, it is worth noting that model.transform is generally a reference to model.config.transform.
If your data is already pre-processed, then you can set model.transform to be the
Identity.
When model.train() is called, the first step is to call model.train_pre_process(). This method
Records the dimension of the training data as
model.dimTrains
model.transformand applies it to the training dataRecords the sampling frequency of the transformed training data as
model.timedelta(as well as the offsetmodel.timedelta_offset)For forecasters, we additionally train and apply
model.exog_transformon the exogenous data if any are given. We also record the dimension of the exogenous data asmodel.exog_dim.
For anomaly detection, model.get_anomaly_score(time_series, time_series_prev)
includes the following pre-processing steps:
Apply
model.transformto the concatenation oftime_series_prevandtime_series.Ensure that the data’s dimension matches the dimension of the training data.
For forecasting, model.forecast(time_stamps, time_series_prev, exog_data)
includes the following pre-processing steps:
If the model expects time series to be sampled at a fixed frequency, resample
time_stampsto the frequency specified bymodel.timedeltaandmodel.timedelta_offset.Save the current inversion state of
model.transform, and then applymodel.transformtotime_series_prev.If
exog_datais given, applymodel.exog_transformtoexog_data, and resampleexog_datato the same time stamps astime_series_prev(after the transform) andtime_stamps.Ensure that the dimensions of
time_series_prevandexog_datamatch the training data. See Forecasting With Exogenous Regressors for more details on exogenous regressors.
User-Defined Implementations
After pre-processing the input data, we pass it to the user-defined implementations model._train(),
model._train_with_exog(), model._get_anomaly_score(), or model._forecast(). These methods do the real work
of training or inference for the underlying model, and these are the methods that must be manually defined for each new model.
Post-Processing
After training, both anomaly detectors and forecasters apply model.train_post_process() on the output of
model._train(). For anomaly detectors, this involves training their post-rule (calibrator and threshold) and then
returning the anomaly scores returned by model._train(). For forecasters, this involves applying the inverse of
model.transform on the forecast returned by model._train().
For anomaly detectors, the final step of calling model.get_anomaly_label() is to apply the post-rule on the
unprocessed anomaly scores. For forecasters, we apply the inverse transform on the forecast and then set the inversion
state of model.transform to be what it was before model.forecast() was called.
Multiple Time Series
If we extend Merlion to accommodate training models on multiple time series, we must make some changes to the way that models handle transforms. In particular,
model.transformshould be re-trained for each time series individually.At training time, we will probably need to write a new method
model.train_pre_process_multiple()which uses a different copy ofmodel.transformfor each time series. The other functionality should be similar tomodel.train_pre_process().At inference time,
time_series_prevmust be a required parameter, and a copy ofmodel.transformshould be trained ontime_series_prev.
To make training code easier to write,
model.train_multiple()probably doesn’t need to return anything when trained on multiple time series. This also removes the need to invert the transform on the training data.For anomaly detection, the post-processing transforms should be updated to accommodate multiple time series. This is especially important for calibration. For example, if we receive 10 time series of of anomaly scores, we should use all 10 to learn a single calibrator, rather than learning one calibrator per time series. The underlying assumption is that the anomaly score distributions should be similar across all time series.
For forecasting,
model.transformcan be trained and applied ontime_series_prev, and then inverted on the concatenation oftime_series_prevandforecastas it is done now, via a call tomodel._process_forecast().model.exog_transformshould also be handled similarly (minus the inversion). See Forecasting With Exogenous Regressors for more details on exogenous regressors.
In general, the code changes to model.forecast() and model.get_anomaly_score() are relatively minor.
If the flag model.multi_series `` is set, then make sure that ``time_series_prev is given and then train
model.transform and model.exog_transform on time_series_prev and exog_data respectively. After this
point, the functions should be unchanged.
Model Variants
There are a number of model variants which either build upon the above model classes or modify them slightly.
Simple Variants
Below are some simpler model variants that are useful to understand:
In order to support forecasting with exogenous regressors, we implement the
ForecasterExogBasebase class. Most of the functionality to support exogenous regressors is actually implemented inForecasterBase, which this class inherits from. The only real difference is that a few internal fields have been changed to indicate that exogenous regressors are supported.We support using basic forecasters as the basis for anomaly detection models. The key piece is the mixin class
ForecastingDetectorBase.Some models don’t work unless the input is pre-normalized. To support these models, we implement the
NormalizingConfig. This config class applies aMeanVarNormalizeafter any other pre-processing (specified by the user intransform) has been applied. The full transform is accessed viaconfig.full_transform. Models automatically understand how this works because the propertymodel.transformtries to getmodel.config.full_transformif possible and defaults tomodel.config.transformotherwise. When using this class to implement models, simply add theNormalizingConfigas a base class for your model.
Ensembles
Merlion supports ensembles of both anomaly detectors and forecasters. The ensemble config has two key components
which make this possible: ensemble.config.models contains all the models present in the ensemble, while
ensemble.config.combiner contains a combiner object which defines
a way of combining the outputs of multiple models. This includes Mean, Median, and ModelSelector based on an evaluation
metric. When doing model selection, the ensemble.train() method automatically splits the train data into training
and validation splits, and it evaluates the performance of each model on the validation split.
It then re-trains each model on the full training data afterwards.
One possible improvement is to parallelize the training of each models in the models. We can probably just use
Python’s native multiprocessing library.
Layered Models
Layered models are a useful abstraction for models that act as a wrapper around another model. This feature is
especially useful for AutoML. Like ensembles, we store the wrapped model in layered_model.config.model,
and layered_model.model is a reference to layered_model.config.model. The base model is the model at the
lowest level of the hierarchy.
There are a number of dirty tricks used to (1) ensure that layered anomaly detectors and forecasters inherit from the
right base classes, (2) config parameters are not duplicated between different levels of the hierarchy, and (3) users
can call a parameter like layered_model.config.max_forecast_steps (which should only be defined for the base model)
and receive layered_model.base_model.config.max_forecast_steps directly.
The documentation for merlion.models.layers has some more details.
Post-Processing
Distinct post-rules are only relevant for anomaly detection.
There are two types of post-rules: calibration and thresholding. Similar to transforms, post-rules may be trained by
calling post_rule.train(train_anom_scores) and applied by calling post_rule(anom_scores). Extending post-rules
so that they can be trained on multiple time series simultaneously is a worthwhile direction to investigate.
Other Modules
Most other modules are stand-alone pieces that don’t directly interact with each other, except in longer pipelines. We defer to the main documentation in merlion: Time Series Intelligence.