Multivariate Time Series Anomaly Detection

Multivariate time series anomaly detection works in largely the same way as univariate time series anomaly detection (covered here and here). To begin, we will load the multivariate MSL dataset for time series anomaly detection.

[1]:
from merlion.utils import TimeSeries
from ts_datasets.anomaly import MSL

time_series, metadata = MSL()[0]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])

print(f"Time series is {train_data.dim}-dimensional")
Time series is 55-dimensional

Model Initialization and Training

For the purposes of this tutorial, we will be using 3 models:

  1. DefaultDetector (which automatically detects whether the input time series is univariate or multivariate);

  2. IsolationForest (a classic algorithm); and

  3. A DetectorEnsemble which takes the maximum anomaly score returned by either model.

Note that while all multivariate anomaly detection models can be used on univariate time series, some Merlion models (e.g. WindStats, ZMS, StatThreshold) are specific to univariate time series. However, the API is identical to that of univariate anomaly detection models.

[2]:
# We initialize models using the model factory in this tutorial
# We manually set the detection threshold to 2 (in standard deviation units) for all models
from merlion.models.factory import ModelFactory
from merlion.post_process.threshold import AggregateAlarms

model1 = ModelFactory.create("DefaultDetector",
                             threshold=AggregateAlarms(alm_threshold=2))

model2 = ModelFactory.create("IsolationForest",
                             threshold=AggregateAlarms(alm_threshold=2))

# Here, we create a _max ensemble_ that takes the maximal anomaly score
# returned by any individual model (rather than the mean).
model3 = ModelFactory.create("DetectorEnsemble", models=[model1, model2],
                             threshold=AggregateAlarms(alm_threshold=2),
                             combiner={"name": "Max"})

for model in [model1, model2, model3]:
    print(f"Training {type(model).__name__}...")
    train_scores = model.train(train_data)
Training DefaultDetector...
 |████████████████████████████████████████| 100.0% Complete, Loss 157274.2031
Training IsolationForest...
Training DetectorEnsemble...
 |████████████████████████████████████████| 100.0% Complete, Loss 156991.0312

Model Inference and Quantitative Evaluation

Like univariate models, we may call get_anomaly_label() to get a sequence of post-processed (calibrated and thresholded) training scores. We can then use these to evaluate the model’s performance.

[3]:
from merlion.evaluate.anomaly import TSADMetric

for model in [model1, model2, model3]:
    labels = model.get_anomaly_label(test_data)
    precision = TSADMetric.PointAdjustedPrecision.value(ground_truth=test_labels, predict=labels)
    recall = TSADMetric.PointAdjustedRecall.value(ground_truth=test_labels, predict=labels)
    f1 = TSADMetric.PointAdjustedF1.value(ground_truth=test_labels, predict=labels)
    mttd = TSADMetric.MeanTimeToDetect.value(ground_truth=test_labels, predict=labels)
    print(f"{type(model).__name__}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1:        {f1:.4f}")
    print(f"MTTD:      {mttd}")
    print()
DefaultDetector
Precision: 0.9570
Recall:    0.8571
F1:        0.9043
MTTD:      0 days 01:06:21

IsolationForest
Precision: 0.9638
Recall:    0.8192
F1:        0.8856
MTTD:      0 days 01:40:57

DetectorEnsemble
Precision: 0.9527
Recall:    0.8708
F1:        0.9099
MTTD:      0 days 01:20:02

We can also use a TSADEvaluator to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 week (cadence="1w"). However, we only retrain the model every 4 weeks (retrain_freq="4w").

[4]:
from merlion.evaluate.anomaly import TSADEvaluator, TSADEvaluatorConfig
for model in [model1, model2, model3]:
    print(f"{type(model).__name__} Sliding Window Evaluation")
    evaluator = TSADEvaluator(model=model, config=TSADEvaluatorConfig(
        cadence="1w", retrain_freq="4w"))
    train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)
    precision = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,
                                   metric=TSADMetric.PointAdjustedPrecision)
    recall = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,
                                metric=TSADMetric.PointAdjustedRecall)
    f1 = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,
                            metric=TSADMetric.PointAdjustedF1)
    mttd = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,
                              metric=TSADMetric.MeanTimeToDetect)
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1:        {f1:.4f}")
    print(f"MTTD:      {mttd}")
    print()
DefaultDetector Sliding Window Evaluation
 |████████████████████████████████████████| 100.0% Complete, Loss 156283.4219
TSADEvaluator:  55%|█████▍    | 2419200/4423680 [00:31<00:27, 72283.89it/s]
 |████------------------------------------| 10.0% Complete, Loss 187671.3750
TSADEvaluator:  55%|█████▍    | 2419200/4423680 [00:50<00:27, 72283.89it/s]
 |████████████████████████████████████████| 100.0% Complete, Loss 174095.8906
TSADEvaluator: 100%|██████████| 4423680/4423680 [03:36<00:00, 20417.36it/s]
Precision: 0.9599
Recall:    0.8571
F1:        0.9056
MTTD:      0 days 01:10:05

IsolationForest Sliding Window Evaluation
TSADEvaluator: 100%|██████████| 4423680/4423680 [00:30<00:00, 146126.28it/s]
Precision: 0.9666
Recall:    0.8321
F1:        0.8943
MTTD:      0 days 01:40:42

DetectorEnsemble Sliding Window Evaluation
 |████████████████████████████████████████| 100.0% Complete, Loss 157318.1875
TSADEvaluator:  55%|█████▍    | 2419200/4423680 [00:52<00:33, 60470.46it/s]
 |████████████████████████████████████████| 100.0% Complete, Loss 173354.6250
TSADEvaluator: 100%|██████████| 4423680/4423680 [04:11<00:00, 17618.46it/s]
Precision: 0.9638
Recall:    0.8645
F1:        0.9115
MTTD:      0 days 01:28:00