{ "cells": [ { "cell_type": "markdown", "id": "08390475", "metadata": {}, "source": [ "# Multivariate Time Series Anomaly Detection\n", "\n", "Multivariate time series anomaly detection works in largely the same way as univariate time series anomaly detection (covered [here](0_AnomalyIntro.ipynb) and [here](1_AnomalyFeatures.ipynb)). To begin, we will load the multivariate `MSL` dataset for time series anomaly detection." ] }, { "cell_type": "code", "execution_count": 1, "id": "894a554f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time series is 55-dimensional\n" ] } ], "source": [ "from merlion.utils import TimeSeries\n", "from ts_datasets.anomaly import MSL\n", "\n", "time_series, metadata = MSL()[0]\n", "train_data = TimeSeries.from_pd(time_series[metadata.trainval])\n", "test_data = TimeSeries.from_pd(time_series[~metadata.trainval])\n", "test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])\n", "\n", "print(f\"Time series is {train_data.dim}-dimensional\")" ] }, { "cell_type": "markdown", "id": "5b9174b4", "metadata": {}, "source": [ "## Model Initialization and Training\n", "\n", "For the purposes of this tutorial, we will be using 3 models:\n", "\n", "1. `DefaultDetector` (which automatically detects whether the input time series is univariate or multivariate);\n", "2. `IsolationForest` (a classic algorithm); and \n", "3. A `DetectorEnsemble` which takes the maximum anomaly score returned by either model.\n", "\n", "Note that while all multivariate anomaly detection models can be used on univariate time series, some Merlion models (e.g. `WindStats`, `ZMS`, `StatThreshold`) are specific to univariate time series. However, the API is identical to that of univariate anomaly detection models." ] }, { "cell_type": "code", "execution_count": 2, "id": "cc5c6e4b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training DefaultDetector...\n", " |████████████████████████████████████████| 100.0% Complete, Loss 157274.2031\n", "Training IsolationForest...\n", "Training DetectorEnsemble...\n", " |████████████████████████████████████████| 100.0% Complete, Loss 156991.0312\n" ] } ], "source": [ "# We initialize models using the model factory in this tutorial\n", "# We manually set the detection threshold to 2 (in standard deviation units) for all models\n", "from merlion.models.factory import ModelFactory\n", "from merlion.post_process.threshold import AggregateAlarms\n", "\n", "model1 = ModelFactory.create(\"DefaultDetector\",\n", " threshold=AggregateAlarms(alm_threshold=2))\n", "\n", "model2 = ModelFactory.create(\"IsolationForest\",\n", " threshold=AggregateAlarms(alm_threshold=2))\n", "\n", "# Here, we create a _max ensemble_ that takes the maximal anomaly score\n", "# returned by any individual model (rather than the mean).\n", "model3 = ModelFactory.create(\"DetectorEnsemble\", models=[model1, model2],\n", " threshold=AggregateAlarms(alm_threshold=2),\n", " combiner={\"name\": \"Max\"})\n", "\n", "for model in [model1, model2, model3]:\n", " print(f\"Training {type(model).__name__}...\")\n", " train_scores = model.train(train_data)" ] }, { "cell_type": "markdown", "id": "15cb56f7", "metadata": {}, "source": [ "## Model Inference and Quantitative Evaluation\n", "\n", "Like univariate models, we may call `get_anomaly_label()` to get a sequence of post-processed (calibrated and thresholded) training scores. We can then use these to evaluate the model's performance." ] }, { "cell_type": "code", "execution_count": 3, "id": "e18c0534", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DefaultDetector\n", "Precision: 0.9570\n", "Recall: 0.8571\n", "F1: 0.9043\n", "MTTD: 0 days 01:06:21\n", "\n", "IsolationForest\n", "Precision: 0.9638\n", "Recall: 0.8192\n", "F1: 0.8856\n", "MTTD: 0 days 01:40:57\n", "\n", "DetectorEnsemble\n", "Precision: 0.9527\n", "Recall: 0.8708\n", "F1: 0.9099\n", "MTTD: 0 days 01:20:02\n", "\n" ] } ], "source": [ "from merlion.evaluate.anomaly import TSADMetric\n", "\n", "for model in [model1, model2, model3]:\n", " labels = model.get_anomaly_label(test_data)\n", " precision = TSADMetric.PointAdjustedPrecision.value(ground_truth=test_labels, predict=labels)\n", " recall = TSADMetric.PointAdjustedRecall.value(ground_truth=test_labels, predict=labels)\n", " f1 = TSADMetric.PointAdjustedF1.value(ground_truth=test_labels, predict=labels)\n", " mttd = TSADMetric.MeanTimeToDetect.value(ground_truth=test_labels, predict=labels)\n", " print(f\"{type(model).__name__}\")\n", " print(f\"Precision: {precision:.4f}\")\n", " print(f\"Recall: {recall:.4f}\")\n", " print(f\"F1: {f1:.4f}\")\n", " print(f\"MTTD: {mttd}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "2972aa3e", "metadata": {}, "source": [ "We can also use a `TSADEvaluator` to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 week (`cadence=\"1w\"`). However, we only retrain the model every 4 weeks (`retrain_freq=\"4w\"`)." ] }, { "cell_type": "code", "execution_count": 4, "id": "47a4508d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DefaultDetector Sliding Window Evaluation\n", " |████████████████████████████████████████| 100.0% Complete, Loss 156283.4219\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "TSADEvaluator: 55%|█████▍ | 2419200/4423680 [00:31<00:27, 72283.89it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " |████------------------------------------| 10.0% Complete, Loss 187671.3750" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r", "TSADEvaluator: 55%|█████▍ | 2419200/4423680 [00:50<00:27, 72283.89it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ " |████████████████████████████████████████| 100.0% Complete, Loss 174095.8906\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "TSADEvaluator: 100%|██████████| 4423680/4423680 [03:36<00:00, 20417.36it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Precision: 0.9599\n", "Recall: 0.8571\n", "F1: 0.9056\n", "MTTD: 0 days 01:10:05\n", "\n", "IsolationForest Sliding Window Evaluation\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "TSADEvaluator: 100%|██████████| 4423680/4423680 [00:30<00:00, 146126.28it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Precision: 0.9666\n", "Recall: 0.8321\n", "F1: 0.8943\n", "MTTD: 0 days 01:40:42\n", "\n", "DetectorEnsemble Sliding Window Evaluation\n", " |████████████████████████████████████████| 100.0% Complete, Loss 157318.1875\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "TSADEvaluator: 55%|█████▍ | 2419200/4423680 [00:52<00:33, 60470.46it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ " |████████████████████████████████████████| 100.0% Complete, Loss 173354.6250\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "TSADEvaluator: 100%|██████████| 4423680/4423680 [04:11<00:00, 17618.46it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Precision: 0.9638\n", "Recall: 0.8645\n", "F1: 0.9115\n", "MTTD: 0 days 01:28:00\n", "\n" ] } ], "source": [ "from merlion.evaluate.anomaly import TSADEvaluator, TSADEvaluatorConfig\n", "for model in [model1, model2, model3]:\n", " print(f\"{type(model).__name__} Sliding Window Evaluation\")\n", " evaluator = TSADEvaluator(model=model, config=TSADEvaluatorConfig(\n", " cadence=\"1w\", retrain_freq=\"4w\"))\n", " train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)\n", " precision = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n", " metric=TSADMetric.PointAdjustedPrecision)\n", " recall = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n", " metric=TSADMetric.PointAdjustedRecall)\n", " f1 = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n", " metric=TSADMetric.PointAdjustedF1)\n", " mttd = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n", " metric=TSADMetric.MeanTimeToDetect)\n", " print(f\"Precision: {precision:.4f}\")\n", " print(f\"Recall: {recall:.4f}\")\n", " print(f\"F1: {f1:.4f}\")\n", " print(f\"MTTD: {mttd}\")\n", " print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }