merlion.models package

Broadly, Merlion contains two types of models: anomaly detection (merlion.models.anomaly) and forecasting (merlion.models.forecast). Note that there is a distinct subset of anomaly detection models that use forecasting models at their core (merlion.models.anomaly.forecast_based).

We implement an abstract ModelBase class which provides the following functionality for all models:

  1. model = ModelClass(config)

    • initialization with a model-specific config (which inherits from Config)

    • configs contain:

      • a (potentially trainable) data pre-processing transform from merlion.transform; note that model.transform is a property which refers to model.config.transform

      • model-specific hyperparameters

  2. model.save(dirname, save_config=None)

    • saves the model to the specified directory. The model’s configuration is saved to <dirname>/config.json, while the model’s binary data is (by default) saved in binary form to <dirname>/model.pkl. Note that if you edit the saved <dirname>/config.json on disk, the changes will be loaded when you call ModelClass.load(dirname)!

    • this method heavily exploits the fact that many objects in Merlion are JSON-serializable

  3. ModelClass.load(dirname, **kwargs)

    • this class method initializes an instance of ModelClass from the config file saved in <dirname>/config.json, (overriding any parameters of the config with kwargs where relevant), loads the remaining binary data into the model object, and returns the fully initialized model.

For users who aren’t familiar with the specific details of various models, we provide default models for anomaly detection and forecasting in merlion.models.defaults.

We also provide a ModelFactory which can be used to conveniently instantiate models from their name and a set of keyword arguments, or to load them directly from disk. For example, we may have the following workflow:

from merlion.models.factory import ModelFactory
from merlion.models.anomaly.windstats import WindStats, WindStatsConfig

# creates the same kind of model in 2 equivalent ways
model1a = WindStats(WindStatsConfig(wind_sz=60))
model1b = ModelFactory.create("WindStats", wind_sz=60)

# save the model & load it in 2 equivalent ways
model1a.save("tmp")
model2a = WindStats.load("tmp")
model2b = ModelFactory.load("tmp")

Finally, we support ensembles of models in merlion.models.ensemble.

base

Contains the base classes for all models.

factory

Contains the ModelFactory.

defaults

Default models for anomaly detection & forecasting that balance speed and performance.

anomaly

Contains all anomaly detection models.

anomaly.forecast_based

Contains all forecaster-based anomaly detectors.

forecast

Contains all forecasting models.

ensemble

Ensembles of models and automated model selection.

automl

Contains all AutoML layers.

Subpackages

Submodules

merlion.models.base module

Contains the base classes for all models.

class merlion.models.base.Config(transform=None, **kwargs)

Bases: object

Abstract class which defines a model config.

Parameters

transform (Optional[TransformBase]) – Transformation to pre-process input time series.

filename = 'config.json'
to_dict(_skipped_keys=None)
Returns

dict with keyword arguments used to initialize the config class.

classmethod from_dict(config_dict, return_unused_kwargs=False, **kwargs)

Constructs a Config from a Python dictionary of parameters.

Parameters
  • config_dict (Dict[str, Any]) – dict that will be used to instantiate this object.

  • return_unused_kwargs – whether to return any unused keyword args.

  • kwargs – any additional parameters to set (overriding config_dict).

Returns

Config object initialized from the dict.

class merlion.models.base.NormalizingConfig(normalize=None, **kwargs)

Bases: Config

Model config where the transform must return normalized values. Applies additional normalization after the initial data pre-processing transform.

Parameters

normalize (Optional[Rescale]) – Pre-trained normalization transformation (optional).

property full_transform

Returns the full transform, including the pre-processing step, lags, and final mean/variance normalization.

property transform
class merlion.models.base.ModelBase(config)

Bases: object

Abstract base class for models.

filename = 'model.pkl'
config_class

alias of Config

reset()

Resets the model’s internal state.

property transform
Returns

The data pre-processing transform to apply on any time series, before giving it to the model.

property timedelta
Returns

the gap (as a pandas.Timedelta or pandas.DateOffset) between data points in the training data

property last_train_time
Returns

the last time (as a pandas.Timestamp) that the model was trained on

train_pre_process(train_data, require_even_sampling, require_univariate)

Applies pre-processing steps common for training most models.

Parameters
  • train_data (TimeSeries) – the original time series of training data

  • require_even_sampling (bool) – whether the model assumes that training data is sampled at a fixed frequency

  • require_univariate (bool) – whether the model only works with univariate time series

Return type

TimeSeries

Returns

the training data, after any necessary pre-processing has been applied

transform_time_series(time_series, time_series_prev=None)

Applies the model’s pre-processing transform to time_series and time_series_prev.

Parameters
  • time_series (TimeSeries) – The time series

  • time_series_prev (Optional[TimeSeries]) – A time series of context, immediately preceding time_series. Optional.

Return type

Tuple[TimeSeries, Optional[TimeSeries]]

Returns

The transformed time_series.

abstract train(train_data, train_config=None)

Trains the model on the specified time series, optionally with some additional implementation-specific config options train_config.

Parameters
  • train_data (TimeSeries) – a TimeSeries to use as a training set

  • train_config – additional configurations (if needed)

save(dirname, **save_config)
Parameters
  • dirname (str) – directory to save the model & its config

  • save_config – additional configurations (if needed)

classmethod load(dirname, **kwargs)
Parameters
  • dirname (str) – directory to load model (and config) from

  • kwargs – config params to override manually

Returns

ModelBase object loaded from file

to_bytes(**save_config)

Converts the entire model state and configuration to a single byte object.

Returns

bytes object representing the model.

classmethod from_bytes(obj, **kwargs)

Creates a fully specified model from a byte object

Parameters

obj – byte object to convert into a model

Returns

ModelBase object loaded from obj

class merlion.models.base.ModelWrapper(config, model=None)

Bases: ModelBase

Abstract class implementing a model that wraps around another internal model.

filename = 'model'
save(dirname, **save_config)
Parameters
  • dirname (str) – directory to save the model & its config

  • save_config – additional configurations (if needed)

classmethod load(dirname, **kwargs)
Parameters
  • dirname (str) – directory to load model (and config) from

  • kwargs – config params to override manually

Returns

ModelBase object loaded from file

to_bytes(**save_config)

Converts the entire model state and configuration to a single byte object.

Returns

bytes object representing the model.

classmethod from_bytes(obj, **kwargs)

Creates a fully specified model from a byte object

Parameters

obj – byte object to convert into a model

Returns

ModelBase object loaded from obj

merlion.models.factory module

Contains the ModelFactory.

class merlion.models.factory.ModelFactory

Bases: object

classmethod get_model_class(name)
Return type

Type[ModelBase]

classmethod create(name, **kwargs)
Return type

ModelBase

classmethod load(name, model_path, **kwargs)
Return type

ModelBase

classmethod load_bytes(obj, **kwargs)
Return type

ModelBase

merlion.models.defaults module

Default models for anomaly detection & forecasting that balance speed and performance.

class merlion.models.defaults.DefaultModelConfig(granularity=None, **kwargs)

Bases: Config

Parameters

transform – Transformation to pre-process input time series.

to_dict(_skipped_keys=None)
Returns

dict with keyword arguments used to initialize the config class.

class merlion.models.defaults.DefaultDetectorConfig(granularity=None, threshold=None, n_threads=1, **kwargs)

Bases: DetectorConfig, DefaultModelConfig

Config object for default anomaly detection model.

Parameters
  • granularity – the granularity at which the input time series should be sampled, e.g. “5min”, “1h”, “1d”, etc.

  • thresholdThreshold object setting a default anomaly detection threshold in units of z-score.

  • n_threads (int) – the number of parallel threads to use for relevant models

class merlion.models.defaults.DefaultDetector(config, model=None)

Bases: ModelWrapper, DetectorBase

Default anomaly detection model that balances efficiency with performance.

Parameters

config (Config) – model configuration

config_class

alias of DefaultDetectorConfig

property granularity
train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the anomaly detector (unsupervised) and its post-rule (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores

get_anomaly_label(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores, processed by any relevant post-rules (calibration and/or thresholding).

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores, filtered by the model’s post-rule

class merlion.models.defaults.DefaultForecasterConfig(granularity=None, max_forecast_steps=100, target_seq_index=None, **kwargs)

Bases: ForecasterConfig, DefaultModelConfig

Config object for default forecasting model.

Parameters
  • granularity – the granularity at which the input time series should be sampled, e.g. “5min”, “1h”, “1d”, etc.

  • max_forecast_steps – Max # of steps we would like to forecast for.

  • target_seq_index – If doing multivariate forecasting, the index of univariate whose value you wish to forecast.

class merlion.models.defaults.DefaultForecaster(config, model=None)

Bases: ModelWrapper, ForecasterBase

Default forecasting model that balances efficiency with performance.

config_class

alias of DefaultForecasterConfig

property granularity
train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Return type

Tuple[TimeSeries, Optional[TimeSeries]]

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr (bool) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev (bool) – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, Optional[TimeSeries]], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp