{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "08390475",
   "metadata": {},
   "source": [
    "# Multivariate Time Series Anomaly Detection\n",
    "\n",
    "Multivariate time series anomaly detection works in largely the same way as univariate time series anomaly detection (covered [here](0_AnomalyIntro.ipynb) and [here](1_AnomalyFeatures.ipynb)). To begin, we will load the multivariate `MSL` dataset for time series anomaly detection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "894a554f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time series is 55-dimensional\n"
     ]
    }
   ],
   "source": [
    "from merlion.utils import TimeSeries\n",
    "from ts_datasets.anomaly import MSL\n",
    "\n",
    "time_series, metadata = MSL()[0]\n",
    "train_data = TimeSeries.from_pd(time_series[metadata.trainval])\n",
    "test_data = TimeSeries.from_pd(time_series[~metadata.trainval])\n",
    "test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])\n",
    "\n",
    "print(f\"Time series is {train_data.dim}-dimensional\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b9174b4",
   "metadata": {},
   "source": [
    "## Model Initialization and Training\n",
    "\n",
    "For the purposes of this tutorial, we will be using 3 models:\n",
    "\n",
    "1. `DefaultDetector` (which automatically detects whether the input time series is univariate or multivariate);\n",
    "2. `IsolationForest` (a classic algorithm); and \n",
    "3. A `DetectorEnsemble` which takes the maximum anomaly score returned by either model.\n",
    "\n",
    "Note that while all multivariate anomaly detection models can be used on univariate time series, some Merlion models (e.g. `WindStats`, `ZMS`, `StatThreshold`) are specific to univariate time series. However, the API is identical to that of univariate anomaly detection models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cc5c6e4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training DefaultDetector...\n",
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.5998\n",
      "Training IsolationForest...\n",
      "Training DetectorEnsemble...\n",
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.6001\n"
     ]
    }
   ],
   "source": [
    "# We initialize models using the model factory in this tutorial\n",
    "# We manually set the detection threshold to 2 (in standard deviation units) for all models\n",
    "from merlion.models.factory import ModelFactory\n",
    "from merlion.post_process.threshold import AggregateAlarms\n",
    "\n",
    "model1 = ModelFactory.create(\"DefaultDetector\",\n",
    "                             threshold=AggregateAlarms(alm_threshold=2))\n",
    "\n",
    "model2 = ModelFactory.create(\"IsolationForest\",\n",
    "                             threshold=AggregateAlarms(alm_threshold=2))\n",
    "\n",
    "# Here, we create a _max ensemble_ that takes the maximal anomaly score\n",
    "# returned by any individual model (rather than the mean).\n",
    "model3 = ModelFactory.create(\"DetectorEnsemble\", models=[model1, model2],\n",
    "                             threshold=AggregateAlarms(alm_threshold=2),\n",
    "                             combiner={\"name\": \"Max\"})\n",
    "\n",
    "for model in [model1, model2, model3]:\n",
    "    print(f\"Training {type(model).__name__}...\")\n",
    "    train_scores = model.train(train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15cb56f7",
   "metadata": {},
   "source": [
    "## Model Inference and Quantitative Evaluation\n",
    "\n",
    "Like univariate models, we may call `get_anomaly_label()` to get a sequence of post-processed (calibrated and thresholded) training scores. We can then use these to evaluate the model's performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e18c0534",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DefaultDetector\n",
      "Precision: 0.9659\n",
      "Recall:    0.8312\n",
      "F1:        0.8935\n",
      "MTTD:      0 days 01:21:15\n",
      "\n",
      "IsolationForest\n",
      "Precision: 0.9638\n",
      "Recall:    0.8192\n",
      "F1:        0.8856\n",
      "MTTD:      0 days 01:40:57\n",
      "\n",
      "DetectorEnsemble\n",
      "Precision: 0.9620\n",
      "Recall:    0.8184\n",
      "F1:        0.8844\n",
      "MTTD:      0 days 01:34:51\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from merlion.evaluate.anomaly import TSADMetric\n",
    "\n",
    "for model in [model1, model2, model3]:\n",
    "    labels = model.get_anomaly_label(test_data)\n",
    "    precision = TSADMetric.PointAdjustedPrecision.value(ground_truth=test_labels, predict=labels)\n",
    "    recall = TSADMetric.PointAdjustedRecall.value(ground_truth=test_labels, predict=labels)\n",
    "    f1 = TSADMetric.PointAdjustedF1.value(ground_truth=test_labels, predict=labels)\n",
    "    mttd = TSADMetric.MeanTimeToDetect.value(ground_truth=test_labels, predict=labels)\n",
    "    print(f\"{type(model).__name__}\")\n",
    "    print(f\"Precision: {precision:.4f}\")\n",
    "    print(f\"Recall:    {recall:.4f}\")\n",
    "    print(f\"F1:        {f1:.4f}\")\n",
    "    print(f\"MTTD:      {mttd}\")\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2972aa3e",
   "metadata": {},
   "source": [
    "We can also use a `TSADEvaluator` to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 week (`cadence=\"1w\"`). However, we only retrain the model every 4 weeks (`retrain_freq=\"4w\"`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "47a4508d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DefaultDetector Sliding Window Evaluation\n",
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.5998\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TSADEvaluator:  55%|█████▍    | 2419200/4423680 [00:23<00:19, 101454.83it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.6667\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TSADEvaluator: 100%|██████████| 4423680/4423680 [02:09<00:00, 34157.67it/s] \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Precision: 0.9522\n",
      "Recall:    0.8027\n",
      "F1:        0.8711\n",
      "MTTD:      0 days 01:22:46\n",
      "\n",
      "IsolationForest Sliding Window Evaluation\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TSADEvaluator: 100%|██████████| 4423680/4423680 [00:27<00:00, 160149.84it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Precision: 0.9666\n",
      "Recall:    0.8321\n",
      "F1:        0.8943\n",
      "MTTD:      0 days 01:40:42\n",
      "\n",
      "DetectorEnsemble Sliding Window Evaluation\n",
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.6002\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TSADEvaluator:  55%|█████▍    | 2419200/4423680 [00:28<00:24, 83276.94it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " |████████████████████████████████████████| 100.0% Complete, Loss 0.6532\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TSADEvaluator: 100%|██████████| 4423680/4423680 [02:37<00:00, 28002.66it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Precision: 0.9453\n",
      "Recall:    0.8209\n",
      "F1:        0.8787\n",
      "MTTD:      0 days 01:32:18\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from merlion.evaluate.anomaly import TSADEvaluator, TSADEvaluatorConfig\n",
    "for model in [model1, model2, model3]:\n",
    "    print(f\"{type(model).__name__} Sliding Window Evaluation\")\n",
    "    evaluator = TSADEvaluator(model=model, config=TSADEvaluatorConfig(\n",
    "        cadence=\"1w\", retrain_freq=\"4w\"))\n",
    "    train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)\n",
    "    precision = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n",
    "                                   metric=TSADMetric.PointAdjustedPrecision)\n",
    "    recall = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n",
    "                                metric=TSADMetric.PointAdjustedRecall)\n",
    "    f1 = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n",
    "                            metric=TSADMetric.PointAdjustedF1)\n",
    "    mttd = evaluator.evaluate(ground_truth=test_labels, predict=test_pred,\n",
    "                              metric=TSADMetric.MeanTimeToDetect)\n",
    "    print(f\"Precision: {precision:.4f}\")\n",
    "    print(f\"Recall:    {recall:.4f}\")\n",
    "    print(f\"F1:        {f1:.4f}\")\n",
    "    print(f\"MTTD:      {mttd}\")\n",
    "    print()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}