merlion.models.anomaly.change_point package
Contains all change point detection algorithms. These models implement the anomaly detector interface, but they are specialized for detecting change points in time series.
merlion.models.anomaly.change_point.bocpd module
Bayesian online change point detection algorithm.
class merlion.models.anomaly.change_point.bocpd.ChangeKind(value)
Enum representing the kinds of changes points we would like to detect.
ConjPrior
Auto = None
Automatically choose the Bayesian conjugate prior we would like to use.
- LevelShift = <class 'merlion.utils.conj_priors.MVNormInvWishart'>
Model data points with a normal distribution, to detect level shifts.
- TrendChange = <class 'merlion.utils.conj_priors.BayesianMVLinReg'>
Model data points as a linear function of time, to detect trend changes.
class merlion.models.anomaly.change_point.bocpd.BOCPDConfig(change_kind=ChangeKind.Auto, cp_prior=0.01, lag=None, min_likelihood=1e-12, max_forecast_steps=None, target_seq_index=None, transform=None, enable_calibrator=False, max_score=1000, threshold=None, enable_threshold=True, normalize=None, **kwargs)

Config class for BOCPD (Bayesian Online Change Point Detection).
Config class for
BOCPD
Base class of the object used to configure an anomaly detection model.
- Parameters
change_kind (
Union
[str
,ChangeKind
change_kind – the kind of change points we would like to detect

cp_prior – prior belief probability of how frequently changepoints occur
lag – the maximum amount of delay/lookback (in number of steps) allowed for detecting change points. If
lag
isNone
lag is None, we will consider the entire history. Note: we do not recommend lag = 0.
.min_likelihood – we will discard any hypotheses whose probability of being a change point is lower than this threshold. Lower values improve accuracy at the cost of time and space complexity.
max_forecast_steps – the maximum number of steps the model is allowed to forecast. Ignored.
target_seq_index – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.
transform – Transformation to pre-process input time series.
enable_calibrator –
False
because this config assumes calibrated outputs from the model.max_score – maximum possible uncalibrated anomaly score
threshold – the rule to use for thresholding anomaly scores
enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores
normalize – Pre-trained normalization transformation (optional).
- to_dict(_skipped_keys=None)
- Returns
dict with keyword arguments used to initialize the config class.
property change_kind: ChangeKind
- Return type
class merlion.models.anomaly.change_point.bocpd.BOCPD(config=None)
Bayesian online change point detection algorithm described by Adams & MacKay (2007). At a high level, this algorithm models the observed data using Bayesian conjugate priors. If an observed value deviates too much from the current posterior distribution, it is likely a change point, and we should start modeling the time series from that point forwards with a freshly initialized Bayesian conjugate prior.
The
get_anomaly_score()
method returns a z-score corresponding to the probability of each point being a change point. Theforecast()
method returns the predicted values (and standard error) of the underlying piecewise model on the relevant data.- Parameters
config (
Optional
[BOCPDConfig
config – model configuration
- property last_train_time
- Returns
the last time (as a
pandas.Timestamp
) that the model was trained on
- property n_seen
- Returns
the number of data points seen so far
property change_kind: ChangeKind
- Return type
- Returns
the kind of change points we would like to detect
property cp_prior: float
- Return type
float
- Returns
prior belief probability of how frequently changepoints occur
property lag: int
- Return type
int
- Returns
the maximum amount of delay allowed for detecting change points. A higher lag can increase recall, but it may decrease precision.
property min_likelihood: float
- Return type
float
- Returns
we will not consider any hypotheses (about whether a particular point is a change point) with likelihood lower than this threshold
- train_pre_process(train_data, require_even_sampling, require_univariate)
Applies pre-processing steps common for training most models.
- Parameters
train_data (
TimeSeries
) – the original time series of training datarequire_even_sampling (
bool
) – whether the model assumes that training data is sampled at a fixed frequencyrequire_univariate (
bool
) – whether the model only works with univariate time series
- Return type
- Returns
the training data, after any necessary pre-processing has been applied
- forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)
Returns the model’s forecast on the timestamps given. Note that if
self.transform
is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.
time_stamps (
Union
[int
,List
[int
]]) – Either alist
of timestamps we wish to forecast for, or the number of steps (int
) we wish to forecast for.time_series_prev (
Optional
[TimeSeries
]) – a list of (timestamp, value) pairs immediately precedingtime_series
. If given, we use it to initialize the time series model. Otherwise, we assume thattime_series
immediately follows the training data.return_iqr (
bool
) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.return_prev (
bool
) – whether to return the forecast fortime_series_prev
(and its stderr or IQR if relevant), in addition to the forecast fortime_stamps
. Only used iftime_series_prev
is provided.
- Return type
Union
[Tuple
[TimeSeries
,TimeSeries
],Tuple
[TimeSeries
,TimeSeries
,TimeSeries
]]- Returns
(forecast, forecast_stderr)
ifreturn_iqr
is false,(forecast, forecast_lb, forecast_ub)
otherwise.forecast
: the forecast for the timestamps givenforecast_stderr
: the standard error of each forecast value.May be
None
.
forecast_lb
: 25th percentile of forecast values for each timestampforecast_ub
: 75th percentile of forecast values for each timestamp
- get_figure(*, time_series=None, time_stamps=None, time_series_prev=None, plot_anomaly=True, filter_scores=True, plot_forecast=False, plot_forecast_uncertainty=False, plot_time_series_prev=False)
- Parameters
time_series (
Optional
[TimeSeries
]) – the time series over whose timestamps we wish to make a forecast. Exactly one oftime_series
ortime_stamps
should be provided.time_stamps (
Optional
[List
[int
]]) – a list of timestamps we wish to forecast for. Exactly one oftime_series
ortime_stamps
should be provided.time_series_prev (
Optional
[TimeSeries
]) – aTimeSeries
immediately precedingtime_stamps
. If given, we use it to initialize the time series model. Otherwise, we assume thattime_stamps
immediately follows the training data.plot_anomaly – Whether to plot the model’s predicted anomaly scores.
filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.
plot_forecast – Whether to plot the model’s forecasted values.
plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.
plot_time_series_prev – whether to plot
time_series_prev
time_series_prev (and the model's fit for it). Only used if time_series_prev is given.
is given.
- Return type
- Returns
a
Figure
of the model’s anomaly score predictions and/or forecast.
- update(time_series)
Updates the BOCPD model’s internal state using the time series values provided.
- Parameters
time_series (
TimeSeries
time_series – time series whose values we are using to update the internal state of the model
anomaly score associated with each point (based on the probability of it being a change point)
- train(train_data, anomaly_labels=None, train_config=None, post_rule_train_config=None)
Trains the underlying forecaster (unsupervised) on the training data. Converts the forecast into anomaly scores, and and then trains the post-rule for filtering anomaly scores (supervised, if labels are given) on the input time series.
- Parameters
train_data (
TimeSeries
) – aTimeSeries
of metric values to train the model.anomaly_labels (
Optional
[TimeSeries
]) – aTimeSeries
indicating which timestamps are anomalous. Optional.train_config – Additional training configs, if needed. Only required for some models.
post_rule_train_config – The config to use for training the model’s post-rule. The model’s default post-rule train config is used if none is supplied here.
- Return type
- Returns
A
TimeSeries
of the model’s anomaly scores on the training data.
- get_anomaly_score(time_series, time_series_prev=None)
Returns the model’s predicted sequence of anomaly scores.
- Parameters
time_series (
TimeSeries
) – theTimeSeries
we wish to predict anomaly scores for.time_series_prev (
Optional
[TimeSeries
]) – aTimeSeries
immediately precedingtime_series
. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume thattime_series
immediately follows the training data.
- Return type
- Returns
a univariate
TimeSeries
of anomaly scores