merlion.models.anomaly.change_point package

Contains all change point detection algorithms. These models implement the anomaly detector interface, but they are specialized for detecting change points in time series.

bocpd

Bayesian online change point detection algorithm.

Submodules

merlion.models.anomaly.change_point.bocpd module

Bayesian online change point detection algorithm.

class merlion.models.anomaly.change_point.bocpd.ChangeKind(value)

Bases: Enum

Enum representing the kinds of changes points we would like to detect. Enum values correspond to the Bayesian ConjPrior class used to detect each sort of change point.

Auto = None

Automatically choose the Bayesian conjugate prior we would like to use.

LevelShift = <class 'merlion.utils.conj_priors.MVNormInvWishart'>

Model data points with a normal distribution, to detect level shifts.

TrendChange = <class 'merlion.utils.conj_priors.BayesianMVLinReg'>

Model data points as a linear function of time, to detect trend changes.

class merlion.models.anomaly.change_point.bocpd.BOCPDConfig(change_kind=ChangeKind.Auto, cp_prior=0.01, lag=None, min_likelihood=1e-12, max_forecast_steps=None, **kwargs)

Bases: ForecasterConfig, NoCalibrationDetectorConfig

Config class for BOCPD (Bayesian Online Change Point Detection).

Parameters
  • change_kind (Union[str, ChangeKind]) – the kind of change points we would like to detect

  • cp_prior – prior belief probability of how frequently changepoints occur

  • lag – the maximum amount of delay/lookback (in number of steps) allowed for detecting change points. If lag is None, we will consider the entire history. Note: we do not recommend lag = 0.

  • min_likelihood – we will discard any hypotheses whose probability of being a change point is lower than this threshold. Lower values improve accuracy at the cost of time and space complexity.

  • max_forecast_steps – the maximum number of steps the model is allowed to forecast. Ignored.

to_dict(_skipped_keys=None)
Returns

dict with keyword arguments used to initialize the config class.

property change_kind: ChangeKind
Return type

ChangeKind

class merlion.models.anomaly.change_point.bocpd.BOCPD(config=None)

Bases: ForecastingDetectorBase

Bayesian online change point detection algorithm described by Adams & MacKay (2007). At a high level, this algorithm models the observed data using Bayesian conjugate priors. If an observed value deviates too much from the current posterior distribution, it is likely a change point, and we should start modeling the time series from that point forwards with a freshly initialized Bayesian conjugate prior.

The get_anomaly_score() method returns a z-score corresponding to the probability of each point being a change point. The forecast() method returns the predicted values (and standard error) of the underlying piecewise model on the relevant data.

Parameters

config (Optional[BOCPDConfig]) – model configuration

config_class

alias of BOCPDConfig

property last_train_time
Returns

the last time (as a pandas.Timestamp) that the model was trained on

property n_seen
Returns

the number of data points seen so far

property change_kind: ChangeKind
Return type

ChangeKind

Returns

the kind of change points we would like to detect

property cp_prior: float
Return type

float

Returns

prior belief probability of how frequently changepoints occur

property lag: int
Return type

int

Returns

the maximum amount of delay allowed for detecting change points. A higher lag can increase recall, but it may decrease precision.

property min_likelihood: float
Return type

float

Returns

we will not consider any hypotheses (about whether a particular point is a change point) with likelihood lower than this threshold

train_pre_process(train_data, require_even_sampling, require_univariate)

Applies pre-processing steps common for training most models.

Parameters
  • train_data (TimeSeries) – the original time series of training data

  • require_even_sampling (bool) – whether the model assumes that training data is sampled at a fixed frequency

  • require_univariate (bool) – whether the model only works with univariate time series

Return type

TimeSeries

Returns

the training data, after any necessary pre-processing has been applied

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr (bool) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev (bool) – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, TimeSeries], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

get_figure(*, time_series=None, time_stamps=None, time_series_prev=None, plot_anomaly=True, filter_scores=True, plot_forecast=False, plot_forecast_uncertainty=False, plot_time_series_prev=False)
Parameters
  • time_series (Optional[TimeSeries]) – the time series over whose timestamps we wish to make a forecast. Exactly one of time_series or time_stamps should be provided.

  • time_stamps (Optional[List[int]]) – a list of timestamps we wish to forecast for. Exactly one of time_series or time_stamps should be provided.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_stamps. If given, we use it to initialize the time series model. Otherwise, we assume that time_stamps immediately follows the training data.

  • plot_anomaly – Whether to plot the model’s predicted anomaly scores.

  • filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.

  • plot_forecast – Whether to plot the model’s forecasted values.

  • plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

Return type

Figure

Returns

a Figure of the model’s anomaly score predictions and/or forecast.

update(time_series)

Updates the BOCPD model’s internal state using the time series values provided.

Parameters

time_series (TimeSeries) – time series whose values we are using to update the internal state of the model

Returns

anomaly score associated with each point (based on the probability of it being a change point)

train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)

Trains the underlying forecaster (unsupervised) on the training data. Converts the forecast into anomaly scores, and and then trains the post-rule for filtering anomaly scores (supervised, if labels are given) on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • anomaly_labels (Optional[TimeSeries]) – a TimeSeries indicating which timestamps are anomalous. Optional.

  • train_config – Additional training configs, if needed. Only required for some models.

  • post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.

Return type

TimeSeries

Returns

A TimeSeries of the model’s anomaly scores on the training data.

get_anomaly_score(time_series, time_series_prev=None)

Returns the model’s predicted sequence of anomaly scores.

Parameters
  • time_series (TimeSeries) – the TimeSeries we wish to predict anomaly scores for.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_series. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume that time_series immediately follows the training data.

Return type

TimeSeries

Returns

a univariate TimeSeries of anomaly scores