merlion.models.anomaly package

Contains all anomaly detection models. Forecaster-based anomaly detection models may be found in merlion.models.anomaly.forecast_based.

For anomaly detection, we define an abstract DetectorBase class which inherits from ModelBase and supports the following interface, in addition to model.save and DetectorClass.load defined for ModelBase:

  1. model = DetectorClass(config)

    • initialization with a model-specific config

    • configs contain:

      • a (potentially trainable) data pre-processing transform from merlion.transform; note that model.transform is a property which refers to model.config.transform

      • a (potentially trainable) post-processing rule from merlion.post_process; note that model.post_rule is a property which refers to model.config.post_rule. In general, this post-rule will have two stages: calibration and thresholding.

      • booleans enable_calibrator and enable_threshold (both defaulting to True) indicating whether to enable calibration and thresholding in the post-rule.

      • model-specific hyperparameters

  2. model.get_anomaly_score(time_series, time_series_prev=None)

    • returns a time series of anomaly scores for each timestamp in time_series

    • time_series_prev (optional): the most recent context, only used for some models. If not provided, the training data is used as the context instead.

  3. model.get_anomaly_label(time_series, time_series_prev=None)

    • returns a time series of post-processed anomaly scores for each timestamp in time_series. These scores are calibrated to correspond to z-scores if enable_calibrator is True, and they have also been filtered by a thresholding rule (model.threshold) if enable_threshold is True. threshold is specified manually in the config (though it may be modified by DetectorBase.train), .

    • time_series_prev (optional): the most recent context, only used for some models. If not provided, the training data is used as the context instead.

  4. model.train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

    • trains the model on the time series train_data

    • anomaly_labels (optional): a time series aligned with train_data, which indicates whether each time stamp is anomalous

    • train_config (optional): extra configuration describing how the model should be trained (e.g. learning rate for the LSTMDetector). Not used for all models. Class-level default provided for models which do use it.

    • post_rule_train_config: extra configuration describing how to train the model’s post-rule. Class-level default is provided for all models.

    • returns a time series of anomaly scores produced by the model on train_data.

base

Base class for anomaly detectors.

dbl

Dynamic Baseline anomaly detection model for time series with daily, weekly or monthly trends.

windstats

Window Statistics anomaly detection model for data with weekly seasonality.

isolation_forest

The classic isolation forest model for anomaly detection.

random_cut_forest

Wrapper around AWS's Random Cut Forest anomaly detection model.

spectral_residual

Spectral Residual algorithm for anomaly detection

stat_threshold

Simple static thresholding model for anomaly detection.

zms

Multiple z-score model (static thresholding at multiple time scales).

autoencoder

The autoencoder-based anomaly detector for multivariate time series

dagmm

Deep autoencoding Gaussian mixture model for anomaly detection (DAGMM)

lstm_ed

The LSTM-encoder-decoder-based anomaly detector for multivariate time series

vae

The VAE-based anomaly detector for multivariate time series

deep_point_anomaly_detector

Deep Point Anomaly Detector algorithm.

Subpackages

Submodules

merlion.models.anomaly.base module

Base class for anomaly detectors.

class merlion.models.anomaly.base.DetectorConfig(max_score=1000, threshold=None, enable_calibrator=True, enable_threshold=True, **kwargs)

Bases: Config

Config object used to define an anomaly detection model.

Base class of the object used to configure an anomaly detection model.

Parameters
  • max_score (float) – maximum possible uncalibrated anomaly score

  • threshold – the rule to use for thresholding anomaly scores

  • enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores

  • enable_calibrator – whether to enable a calibrator which automatically transforms all raw anomaly scores to be z-scores (i.e. distributed as N(0, 1)).

property post_rule
Returns

The full post-processing rule. Includes calibration if enable_calibrator is True, followed by thresholding if enable_threshold is True.

classmethod from_dict(config_dict, return_unused_kwargs=False, **kwargs)

Constructs a Config from a Python dictionary of parameters.

Parameters
  • config_dict (Dict[str, Any]) – dict that will be used to instantiate this object.

  • return_unused_kwargs – whether to return any unused keyword args.

  • kwargs – any additional parameters to set (overriding config_dict).

Returns

Config object initialized from the dict.

class merlion.models.anomaly.base.NoCalibrationDetectorConfig(enable_calibrator=False, **kwargs)

Bases: DetectorConfig

Abstract config object for an anomaly detection model that should never perform anomaly score calibration.

Base class of the object used to configure an anomaly detection model.

Parameters
  • max_score – maximum possible uncalibrated anomaly score

  • threshold – the rule to use for thresholding anomaly scores

  • enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores

  • enable_calibrator – whether to enable a calibrator which automatically transforms all raw anomaly scores to be z-scores (i.e. distributed as N(0, 1)).

property enable_calibrator
Returns

False

class merlion.models.anomaly.base.DetectorBase(config)

Bases: ModelBase

Base class for an anomaly detection model.

Parameters

config (DetectorConfig) – model configuration

config_class

alias of DetectorConfig

property threshold
property calibrator
property post_rule
abstract train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

train_post_rule(anomaly_scores, anomaly_labels=None, post_rule_train_config=None)
abstract get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

get_anomaly_label(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores, processed by any relevant post-rules (calibration and/or thresholding).

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores, filtered by the model’s post-rule

get_figure(time_series, time_series_prev=None, *, filter_scores=True, plot_time_series_prev=False, fig=None)
Parameters
  • time_series (TimeSeries) – The TimeSeries we wish to plot & predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_stamps. If given, we use it to initialize the time series model. Otherwise, we assume that time_stamps immediately follows the training data.

  • filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

  • fig (Optional[Figure]) – a Figure we might want to add anomaly scores onto.

Return type

Figure

Returns

a Figure of the model’s anomaly score predictions.

plot_anomaly(time_series, time_series_prev=None, *, filter_scores=True, plot_time_series_prev=False, figsize=(1000, 600), ax=None)

Plots the time series in matplotlib as a line graph, with points in the series overlaid as points color-coded to indicate their severity as anomalies.

Parameters
  • time_series (TimeSeries) – The TimeSeries we wish to plot & predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. Plotted as context if given.

  • filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

  • figsize – figure size in pixels

  • ax – matplotlib axes to add this plot to

Returns

matplotlib figure & axes

plot_anomaly_plotly(time_series, time_series_prev=None, *, filter_scores=True, plot_time_series_prev=False, figsize=None)

Plots the time series in plotly as a line graph, with points in the series overlaid as points color-coded to indicate their severity as anomalies.

Parameters
  • time_series (TimeSeries) – The TimeSeries we wish to plot & predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. Plotted as context if given.

  • filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

  • figsize – figure size in pixels

Returns

plotly figure

merlion.models.anomaly.dbl module

Dynamic Baseline anomaly detection model for time series with daily, weekly or monthly trends.

class merlion.models.anomaly.dbl.DynamicBaselineConfig(fixed_period=None, train_window=None, wind_sz='1h', trends=None, **kwargs)

Bases: DetectorConfig

Configuration class for DynamicBaseline.

Parameters
  • fixed_period (Optional[Tuple[str, str]]) – (t0, tf); Train the model on all datapoints occurring between t0 and tf (inclusive).

  • train_window (Optional[str]) – A string representing a duration of time to serve as the scope for a rolling dynamic baseline model.

  • wind_sz (str) – The window size in minutes to bucket times of day. This parameter only applied if a daily trend is one of the trends used.

  • trends (Optional[List[str]]) – The list of trends to use. Supported trends are “daily”, “weekly” and “monthly”.

property fixed_period
property trends
determine_train_window()
to_dict(_skipped_keys=None)
Returns

dict with keyword arguments used to initialize the config class.

class merlion.models.anomaly.dbl.DynamicBaseline(config)

Bases: DetectorBase

Dynamic baseline-based anomaly detector.

Detects anomalies by comparing data to historical data that has occurred in the same window of time, as defined by any combination of time of day, day of week, or day of month.

A DBL model can have a fixed period or a dynamic rolling period. A fixed period model trains its baselines exclusively on datapoints occurring in the fixed period, while a rolling period model trains continually on the most recent datapoints within its train-window.

Parameters

config (DynamicBaselineConfig) – model configuration

config_class

alias of DynamicBaselineConfig

property train_window
property fixed_period
property has_fixed_period
property data: UnivariateTimeSeries
Return type

UnivariateTimeSeries

get_relevant(data)

Returns the subset of the data that should be used for training or updating.

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)
Parameters
  • train_data (TimeSeries) – train_data[t] = (timestamp_t, value_t)

  • anomaly_labels (Optional[TimeSeries]) – anomaly_labels[i] = (timestamp_i, is_anom(timestamp_i))

  • train_config – unused

  • post_rule_train_config – config to train the post rule

Return type

TimeSeries

Returns

anomaly scores of training data

get_anomaly_score(time_series, time_series_prev=None)
Parameters
  • time_series (TimeSeries) – a list of (timestamps, score) pairs

  • time_series_prev (Optional[TimeSeries]) – ignored

Return type

TimeSeries

get_baseline(time_stamps)

Returns the dynamic baselines corresponding to the time stamps :type time_stamps: List[float] :param time_stamps: a list of timestamps

Return type

Tuple[UnivariateTimeSeries, UnivariateTimeSeries]

check_dim(time_series)
update(new_data)
get_baseline_figure(time_series, time_series_prev=None, *, filter_scores=True, plot_time_series_prev=False, fig=None, jitter_time_stamps=True)
Return type

Figure

class merlion.models.anomaly.dbl.Trend(value)

Bases: Enum

Enumeration of the supported trends.

daily = 1
weekly = 2
monthly = 3
class merlion.models.anomaly.dbl.Segment(key)

Bases: object

Class representing a segment. The class maintains a mean (baseline) along with a variance so that a z-score can be computed.

add(x)
drop(x)
score(x)
class merlion.models.anomaly.dbl.Segmenter(trends, wind_sz)

Bases: object

Class for managing the segments that belong to a DynamicBaseline model.

Parameters
  • trends (List[Trend]) – A list of trend types to create segments based on.

  • wind_sz (str) – The window size in minutes to bucket times of day. Only used if a daily trend is one of the trends used.

day_delta = Timedelta('1 days 00:00:00')
hour_delta = Timedelta('0 days 01:00:00')
min_delta = Timedelta('0 days 00:01:00')
zero_delta = Timedelta('0 days 00:00:00')
reset()
property wind_delta
property trends
property trend
window_key(t)
weekday_key(t)
day_key(t)
segment_key(timestamp)
add(t, x)
drop(t, x)
score(t, x)
get_baseline(t)
Return type

Tuple[float, float]

merlion.models.anomaly.windstats module

Window Statistics anomaly detection model for data with weekly seasonality.

class merlion.models.anomaly.windstats.WindStatsConfig(wind_sz=30, max_day=4, **kwargs)

Bases: DetectorConfig

Config object used to define an anomaly detection model.

Parameters
  • wind_sz – the window size in minutes, default is 30 minute window

  • max_day – maximum number of week days stored in memory (only mean and std of each window are stored). Here, the days are first bucketed by weekday and then by window id.

class merlion.models.anomaly.windstats.WindStats(config=None)

Bases: DetectorBase

Sliding Window Statistics based Anomaly Detector. This detector assumes the time series comes with a weekly seasonality. It divides the week into buckets of the specified size (in minutes). For a given (t, v) it computes an anomaly score by comparing the current value v against the historical values (mean and standard deviation) for that window of time. Note that if multiple matches (specified by the parameter max_day) can be found in history with the same weekday and same time window, then the minimum of the scores is returned.

config.wind_sz: the window size in minutes, default is 30 minute window config.max_days: maximum number of week days stored in memory (only mean and std of each window are stored) here the days are first bucketized by weekday and then bucketized by window id.

config_class

alias of WindStatsConfig

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

merlion.models.anomaly.isolation_forest module

The classic isolation forest model for anomaly detection.

class merlion.models.anomaly.isolation_forest.IsolationForestConfig(max_n_samples=None, n_estimators=100, **kwargs)

Bases: DetectorConfig

Config object used to define an anomaly detection model.

Configuration class for isolation forest.

Parameters
  • max_n_samples (Optional[int]) – Maximum number of samples to allow the isolation forest to train on. Specify None to use all samples in the training data.

  • n_estimators (int) – number of trees in the isolation forest.

class merlion.models.anomaly.isolation_forest.IsolationForest(config)

Bases: DetectorBase

The classic isolation forest algorithm, proposed in Liu et al. 2008

Parameters

config (IsolationForestConfig) – model configuration

config_class

alias of IsolationForestConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

merlion.models.anomaly.random_cut_forest module

Wrapper around AWS’s Random Cut Forest anomaly detection model.

class merlion.models.anomaly.random_cut_forest.JVMSingleton

Bases: object

class merlion.models.anomaly.random_cut_forest.RandomCutForestConfig(n_estimators=100, parallel=False, seed=None, max_n_samples=512, thread_pool_size=1, online_updates=False, **kwargs)

Bases: DetectorConfig

Config object used to define an anomaly detection model.

Configuration class for random cut forest. Refer to https://github.com/aws/random-cut-forest-by-aws/tree/main/Java for further documentation and defaults of the Java class.

Parameters
  • n_estimators (int) – The number of trees in this forest.

  • parallel (bool) – If true, then the forest will create an internal thread pool. Forest updates and traversals will be submitted to this thread pool, and individual trees will be updated or traversed in parallel. For larger shingle sizes, dimensions, and number of trees, parallelization may improve throughput. We recommend users benchmark against their target use case.

  • seed (Optional[int]) – the random seed

  • max_n_samples (int) – The number of samples retained by by stream samplers in this forest.

  • thread_pool_size (int) – The number of threads to use in the internal thread pool.

  • online_updates (bool) – Whether to update the model while running using it to evaluate new data.

property java_params
class merlion.models.anomaly.random_cut_forest.RandomCutForest(config)

Bases: DetectorBase

The random cut forest is a refinement of the classic isolation forest algorithm. It was proposed in Guha et al. 2016.

Parameters

config (RandomCutForestConfig) – model configuration

config_class

alias of RandomCutForestConfig

property online_updates: bool
Return type

bool

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

merlion.models.anomaly.spectral_residual module

Spectral Residual algorithm for anomaly detection

class merlion.models.anomaly.spectral_residual.SpectralResidualConfig(local_wind_sz=21, q=3, estimated_points=5, predicting_points=5, target_seq_index=None, **kwargs)

Bases: DetectorConfig

Config object used to define an anomaly detection model.

Parameters
  • local_wind_sz – Number of previous saliency points to consider when computing the anomaly score

  • q – Window size of local frequency average computations

  • estimated_points – Number of padding points to add to the timeseries for saliency map calculations.

  • predicting_points – Number of points to consider when computing gradient for padding points

  • target_seq_index – Index of the univariate whose anomalies we want to detect.

The Saliency Map is computed as follows:

\[\begin{split}R(f) &= \log(A(\mathscr{F}(\textbf{x}))) - \left(\frac{1}{q}\right)_{1 \times q} * (A(\mathscr{F}(\textbf{x})) \\ S_m &= \mathscr{F}^{-1} (R(f))\end{split}\]

where \(*\) is the convolution operator, and \(\mathscr{F}\) is the Fourier Transform. The anomaly scores then are computed as:

\[S(x) = \frac{S(x) - \overline{S(\textbf{x})}}{\overline{S(\textbf{x})}}\]

where \(\textbf{x}\) are the last local_wind_sz points in the timeseries.

The estimated_points and predicting_points parameters are used to pad the end of the timeseries with reasonable values. This is done so that the later points in the timeseries are in the middle of averaging windows rather than in the end.

class merlion.models.anomaly.spectral_residual.SpectralResidual(config=None)

Bases: DetectorBase

Spectral Residual Algorithm for Anomaly Detection.

Spectral Residual Anomaly Detection algorithm based on the algorithm described in this paper. After taking the frequency spectrum, compute the log deviation from the mean. Use inverse fourier transform to obtain the saliency map. Anomaly scores for a point in the time series are obtained by comparing the saliency score of the point to the average of the previous points.

Parameters

config (Optional[SpectralResidualConfig]) – model configuration

config_class

alias of SpectralResidualConfig

property target_seq_index: int
Return type

int

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

merlion.models.anomaly.stat_threshold module

Simple static thresholding model for anomaly detection.

class merlion.models.anomaly.stat_threshold.StatThresholdConfig(max_score=1000, threshold=None, enable_calibrator=True, enable_threshold=True, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Config object used to define an anomaly detection model.

Base class of the object used to configure an anomaly detection model.

Parameters
  • max_score (float) – maximum possible uncalibrated anomaly score

  • threshold – the rule to use for thresholding anomaly scores

  • enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores

  • enable_calibrator – whether to enable a calibrator which automatically transforms all raw anomaly scores to be z-scores (i.e. distributed as N(0, 1)).

class merlion.models.anomaly.stat_threshold.StatThreshold(config)

Bases: DetectorBase

Parameters

config (DetectorConfig) – model configuration

config_class

alias of StatThresholdConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

merlion.models.anomaly.zms module

Multiple z-score model (static thresholding at multiple time scales).

class merlion.models.anomaly.zms.ZMSConfig(base=2, n_lags=None, lag_inflation=1.0, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Configuration class for an ZMS anomaly detection model.

Configuration class for ZMS. The transform of this config is actually a pre-processing step, followed by the desired number of lag transforms and a final mean/variance normalization step. This full transform may be accessed as ZMSConfig.full_transform. Note that the normalization is inherited from NormalizingConfig.

Parameters
  • base (int) – The base to use for computing exponentially distant lags.

  • n_lags (Optional[int]) – The number of lags to be used. If None, n_lags will be chosen later as the maximum number of lags possible for the initial training set.

  • lag_inflation (float) – See math below for the precise mathematical role of the lag inflation. Consider the lag inflation a measure of distrust toward higher lags, If lag_inflation > 1, the higher the lag inflation, the less likely the model is to select a higher lag’s z-score as the anomaly score.

\[\begin{split}\begin{align*} \text{Let } \space z_k(x_t) \text{ be the z-score of the } & k\text{-lag at } t, \space \Delta_k(x_t) \text{ and } p \text{ be the lag inflation} \\ & \\ \text{the anomaly score } z(x_t) & = z_{k^*}(x_t) \\ \text{where } k^* & = \text{argmax}_k \space | z_k(x_t) | / k^p \end{align*}\end{split}\]
property full_transform

Returns the full transform, including the pre-processing step, lags, and final mean/variance normalization.

to_dict(_skipped_keys=None)
Returns

dict with keyword arguments used to initialize the config class.

property n_lags
class merlion.models.anomaly.zms.ZMS(config)

Bases: DetectorBase

Multiple Z-Score based Anomaly Detector.

ZMS is designed to detect spikes, dips, sharp trend changes (up or down) relative to historical data. Anomaly scores capture not only magnitude but also direction. This lets one distinguish between positive (spike) negative (dip) anomalies for example.

The algorithm builds models of normalcy at multiple exponentially-growing time scales. The zeroth order model is just a model of the values seen recently. The kth order model is similar except that it models not values, but rather their k-lags, defined as x(t)-x(t-k), for k in 1, 2, 4, 8, 16, etc. The algorithm assigns the maximum absolute z-score of all the models of normalcy as the overall anomaly score.

\[\begin{split}\begin{align*} \text{Let } \space z_k(x_t) \text{ be the z-score of the } & k\text{-lag at } t, \space \Delta_k(x_t) \text{ and } p \text{ be the lag inflation} \\ & \\ \text{the anomaly score } z(x_t) & = z_{k^*}(x_t) \\ \text{where } k^* & = \text{argmax}_k \space | z_k(x_t) | / k^p \end{align*}\end{split}\]
Parameters

config (DetectorConfig) – model configuration

config_class

alias of ZMSConfig

property n_lags
property lag_scales: List[int]
Return type

List[int]

property lag_inflation
property adjust_z_scores: bool
Return type

bool

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

merlion.models.anomaly.autoencoder module

The autoencoder-based anomaly detector for multivariate time series

class merlion.models.anomaly.autoencoder.AutoEncoderConfig(hidden_size=5, layer_sizes=(25, 10, 5), sequence_len=1, lr=0.001, batch_size=512, num_epochs=50, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Configuration class for AutoEncoder. The normalization is inherited from NormalizingConfig. The input data will be standardized automatically.

Parameters
  • hidden_size (int) – The latent size

  • layer_sizes (Sequence[int]) – The hidden layer sizes for the MLP encoder and decoder, e.g., (25, 10, 5) for encoder and (5, 10, 25) for decoder

  • sequence_len (int) – The input series length, e.g., input = [x(t-sequence_len+1)…,x(t-1),x(t)]

  • lr (float) – The learning rate during training

  • batch_size (int) – The batch size during training

  • num_epochs (int) – The number of training epochs

class merlion.models.anomaly.autoencoder.AutoEncoder(config)

Bases: DetectorBase

The autoencoder-based multivariate time series anomaly detector. This detector utilizes an autoencoder to infer the correlations between different time series and estimate the joint distribution of the variables for anomaly detection.

Parameters

config (AutoEncoderConfig) – model configuration

config_class

alias of AutoEncoderConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Train a multivariate time series anomaly detector.

Parameters
  • train_data (TimeSeries) – A TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – A TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)
Parameters
Return type

TimeSeries

Returns

A univariate TimeSeries of anomaly scores

merlion.models.anomaly.vae module

The VAE-based anomaly detector for multivariate time series

class merlion.models.anomaly.vae.VAEConfig(encoder_hidden_sizes=(25, 10, 5), decoder_hidden_sizes=(5, 10, 25), latent_size=5, sequence_len=1, kld_weight=1.0, dropout_rate=0.0, num_eval_samples=10, lr=0.001, batch_size=1024, num_epochs=10, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Configuration class for VAE. The normalization is inherited from NormalizingConfig. The input data will be standardized automatically.

Parameters
  • encoder_hidden_sizes (Sequence[int]) – The hidden layer sizes of the MLP encoder

  • decoder_hidden_sizes (Sequence[int]) – The hidden layer sizes of the MLP decoder

  • latent_size (int) – The latent size

  • sequence_len (int) – The input series length, e.g., input = [x(t-sequence_len+1)…,x(t-1),x(t)]

  • kld_weight (float) – The regularization weight for the KL divergence term

  • dropout_rate (float) – The dropout rate for the encoder and decoder

  • num_eval_samples (int) – The number of sampled latent variables during prediction

  • lr (float) – The learning rate during training

  • batch_size (int) – The batch size during training

  • num_epochs (int) – The number of training epochs

class merlion.models.anomaly.vae.VAE(config)

Bases: DetectorBase

The VAE-based multivariate time series anomaly detector. This detector utilizes a variational autoencoder to infer the correlations between different time series and estimate the distribution of the reconstruction errors for anomaly detection.

Parameters

config (VAEConfig) – model configuration

config_class

alias of VAEConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Train a multivariate time series anomaly detector.

Parameters
  • train_data (TimeSeries) – A TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – A TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)
Parameters
Return type

TimeSeries

Returns

A univariate TimeSeries of anomaly scores

merlion.models.anomaly.dagmm module

Deep autoencoding Gaussian mixture model for anomaly detection (DAGMM)

class merlion.models.anomaly.dagmm.DAGMMConfig(gmm_k=3, hidden_size=5, sequence_len=1, lambda_energy=0.1, lambda_cov_diag=0.005, lr=0.001, batch_size=256, num_epochs=10, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Configuration class for DAGMM. The normalization is inherited from NormalizingConfig. The input data will be standardized automatically.

Parameters
  • gmm_k (int) – The number of Gaussian distributions

  • hidden_size (int) – The hidden size of the autoencoder module in DAGMM

  • sequence_len (int) – The input series length, e.g., input = [x(t-sequence_len+1)…,x(t-1),x(t)]

  • lambda_energy (float) – The regularization weight for the energy term

  • lambda_cov_diag (float) – The regularization weight for the covariance diagonal entries

  • lr (float) – The learning rate during training

  • batch_size (int) – The batch size during training

  • num_epochs (int) – The number of training epochs

class merlion.models.anomaly.dagmm.DAGMM(config)

Bases: DetectorBase

Deep autoencoding Gaussian mixture model for anomaly detection (DAGMM). DAGMM combines an autoencoder with a Gaussian mixture model to model the distribution of the reconstruction errors. DAGMM jointly optimizes the parameters of the deep autoencoder and the mixture model simultaneously in an end-to-end fashion.

Parameters

config (DAGMMConfig) – model configuration

config_class

alias of DAGMMConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Train a multivariate time series anomaly detector.

Parameters
  • train_data (TimeSeries) – A TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – A TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)
Parameters
Return type

TimeSeries

Returns

A univariate TimeSeries of anomaly scores

merlion.models.anomaly.lstm_ed module

The LSTM-encoder-decoder-based anomaly detector for multivariate time series

class merlion.models.anomaly.lstm_ed.LSTMEDConfig(hidden_size=5, sequence_len=20, n_layers=(1, 1), dropout=(0, 0), lr=0.001, batch_size=256, num_epochs=10, **kwargs)

Bases: DetectorConfig, NormalizingConfig

Configuration class for LSTM-encoder-decoder. The normalization is inherited from NormalizingConfig. The input data will be standardized automatically.

Parameters
  • hidden_size (int) – The hidden state size of the LSTM modules

  • sequence_len (int) – The input series length, e.g., input = [x(t-sequence_len+1)…,x(t-1),x(t)]

  • n_layers (Sequence[int]) – The number of layers for the LSTM encoder and decoder. n_layer has two values, i.e., n_layer[0] is the number of encoder layers and n_layer[1] is the number of decoder layers.

  • dropout (Sequence[int]) – The dropout rate for the LSTM encoder and decoder. dropout has two values, i.e., dropout[0] is the dropout rate for the encoder and dropout[1] is the dropout rate for the decoder.

  • lr (float) – The learning rate during training

  • batch_size (int) – The batch size during training

  • num_epochs (int) – The number of training epochs

class merlion.models.anomaly.lstm_ed.LSTMED(config)

Bases: DetectorBase

The LSTM-encoder-decoder-based multivariate time series anomaly detector. The time series representation is modeled by an encoder-decoder network where both encoder and decoder are LSTMs. The distribution of the reconstruction error is estimated for anomaly detection.

Parameters

config (LSTMEDConfig) – model configuration

config_class

alias of LSTMEDConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Train a multivariate time series anomaly detector.

Parameters
  • train_data (TimeSeries) – A TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – A TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)
Parameters
Return type

TimeSeries

Returns

A univariate TimeSeries of anomaly scores

merlion.models.anomaly.deep_point_anomaly_detector module

Deep Point Anomaly Detector algorithm.

class merlion.models.anomaly.deep_point_anomaly_detector.DeepPointAnomalyDetectorConfig(max_score=1000, threshold=None, enable_calibrator=True, enable_threshold=True, **kwargs)

Bases: DetectorConfig

Config object used to define an anomaly detection model.

Base class of the object used to configure an anomaly detection model.

Parameters
  • max_score (float) – maximum possible uncalibrated anomaly score

  • threshold – the rule to use for thresholding anomaly scores

  • enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores

  • enable_calibrator – whether to enable a calibrator which automatically transforms all raw anomaly scores to be z-scores (i.e. distributed as N(0, 1)).

class merlion.models.anomaly.deep_point_anomaly_detector.DeepPointAnomalyDetector(config)

Bases: DetectorBase

Given a time series tuple (time, signal), this algorithm trains an MLP with each element in time and corresponding signal as input-taget pair. Once the MLP is trained for a few itertions, the loss values at each time is regarded as the anomaly score for the corresponding signal. The intuition is that DNNs learn global patterns before overfitting local details. Therefore any point anomalies in the signal will have high MLP loss. These intuitions can be found in: Arpit, Devansh, et al. “A closer look at memorization in deep networks.” ICML 2017 Rahaman, Nasim, et al. “On the spectral bias of neural networks.” ICML 2019

Parameters

config (DeepPointAnomalyDetectorConfig) – model configuration

config_class

alias of DeepPointAnomalyDetectorConfig

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores