anomaly.change_point
Contains all change point detection algorithms. These models implement the anomaly detector interface, but they are specialized for detecting change points in time series.
Bayesian online change point detection algorithm. |
anomaly.change_point.bocpd
Bayesian online change point detection algorithm.
- class merlion.models.anomaly.change_point.bocpd.ChangeKind(value)
Bases:
Enum
Enum representing the kinds of changes points we would like to detect. Enum values correspond to the Bayesian
ConjPrior
class used to detect each sort of change point.- Auto = None
Automatically choose the Bayesian conjugate prior we would like to use.
- LevelShift = <class 'merlion.utils.conj_priors.MVNormInvWishart'>
Model data points with a normal distribution, to detect level shifts.
- TrendChange = <class 'merlion.utils.conj_priors.BayesianMVLinReg'>
Model data points as a linear function of time, to detect trend changes.
- class merlion.models.anomaly.change_point.bocpd.BOCPDConfig(change_kind=ChangeKind.Auto, cp_prior=0.01, lag=None, min_likelihood=1e-16, max_forecast_steps=None, target_seq_index=None, invert_transform=None, transform=None, enable_calibrator=False, max_score=1000, threshold=None, enable_threshold=True, **kwargs)
Bases:
ForecasterConfig
,NoCalibrationDetectorConfig
Config class for
BOCPD
(Bayesian Online Change Point Detection).Base class of the object used to configure an anomaly detection model.
- Parameters
change_kind (
Union
[str
,ChangeKind
]) – the kind of change points we would like to detectcp_prior – prior belief probability of how frequently changepoints occur
lag – the maximum amount of delay/lookback (in number of steps) allowed for detecting change points. If
lag
isNone
, we will consider the entire history. Note: we do not recommendlag = 0
.min_likelihood – we will discard any hypotheses whose probability of being a change point is lower than this threshold. Lower values improve accuracy at the cost of time and space complexity.
max_forecast_steps – the maximum number of steps the model is allowed to forecast. Ignored.
target_seq_index – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.
invert_transform – Whether to automatically invert the
transform
before returning a forecast. By default, we will invert the transform for all base forecasters if it supports a proper inversion, but we will not invert it for forecaster-based anomaly detectors or transforms without proper inversions.transform – Transformation to pre-process input time series.
enable_calibrator –
False
because this config assumes calibrated outputs from the model.max_score – maximum possible uncalibrated anomaly score
threshold – the rule to use for thresholding anomaly scores
enable_threshold – whether to enable the thresholding rule when post-processing anomaly scores
- property change_kind: ChangeKind
- class merlion.models.anomaly.change_point.bocpd.BOCPD(config=None)
Bases:
ForecastingDetectorBase
Bayesian online change point detection algorithm described by Adams & MacKay (2007). At a high level, this algorithm models the observed data using Bayesian conjugate priors. If an observed value deviates too much from the current posterior distribution, it is likely a change point, and we should start modeling the time series from that point forwards with a freshly initialized Bayesian conjugate prior.
The
get_anomaly_score()
method returns a z-score corresponding to the probability of each point being a change point. Theforecast()
method returns the predicted values (and standard error) of the underlying piecewise model on the relevant data.- Parameters
config (
Optional
[BOCPDConfig
]) – model configuration
- config_class
alias of
BOCPDConfig
- property require_even_sampling: bool
Whether the model assumes that training data is sampled at a fixed frequency
- property require_univariate: bool
All forecasters can work on multivariate data, since they only forecast a single target univariate.
- property last_train_time
- Returns
the last time (as a
pandas.Timestamp
) that the model was trained on
- property n_seen
- Returns
the number of data points seen so far
- property change_kind: ChangeKind
- Returns
the kind of change points we would like to detect
- property cp_prior: float
- Returns
prior belief probability of how frequently changepoints occur
- property lag: int
- Returns
the maximum amount of delay allowed for detecting change points. A higher lag can increase recall, but it may decrease precision.
- property min_likelihood: float
- Returns
we will not consider any hypotheses (about whether a particular point is a change point) with likelihood lower than this threshold
- train_pre_process(train_data, exog_data=None, return_exog=False)
Applies pre-processing steps common for training most models.
- Parameters
train_data (
TimeSeries
) – the original time series of training data- Return type
Union
[TimeSeries
,Tuple
[TimeSeries
,Optional
[TimeSeries
]]]- Returns
the training data, after any necessary pre-processing has been applied
- update(time_series)
Updates the BOCPD model’s internal state using the time series values provided.
- Parameters
time_series (
TimeSeries
) – time series whose values we are using to update the internal state of the model- Returns
anomaly score associated with each point (based on the probability of it being a change point)
- get_anomaly_score(time_series, time_series_prev=None, exog_data=None)
Returns the model’s predicted sequence of anomaly scores.
- Parameters
time_series (
TimeSeries
) – theTimeSeries
we wish to predict anomaly scores for.time_series_prev (
Optional
[TimeSeries
]) – aTimeSeries
immediately precedingtime_series
. If given, we use it to initialize the time series anomaly detection model. Otherwise, we assume thattime_series
immediately follows the training data.
- Return type
- Returns
a univariate
TimeSeries
of anomaly scores
- get_figure(*, time_series=None, **kwargs)
- Parameters
time_series (
Optional
[TimeSeries
]) – the time series over whose timestamps we wish to make a forecast. Exactly one oftime_series
ortime_stamps
should be provided.time_stamps – Either a
list
of timestamps we wish to forecast for, or the number of steps (int
) we wish to forecast for. Exactly one oftime_series
ortime_stamps
should be provided.time_series_prev – a time series immediately preceding
time_series
. If given, we use it to initialize the forecaster’s state. Otherwise, we assume thattime_series
immediately follows the training data.exog_data – A time series of exogenous variables. Exogenous variables are known a priori, and they are independent of the variable being forecasted.
exog_data
must include data for all oftime_stamps
; iftime_series_prev
is given, it must include data for all oftime_series_prev.time_stamps
as well. Optional. Only supported for models which inherit fromForecasterExogBase
.plot_anomaly – Whether to plot the model’s predicted anomaly scores.
filter_scores – whether to filter the anomaly scores by the post-rule before plotting them.
plot_forecast – Whether to plot the model’s forecasted values.
plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.
plot_time_series_prev – whether to plot
time_series_prev
(and the model’s fit for it). Only used iftime_series_prev
is given.
- Return type
- Returns
a
Figure
of the model’s anomaly score predictions and/or forecast.