{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### TimeseriesExplainer for time series anomaly detection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The class `TimeseriesExplainer` is designed for time series data, acting as a factory of the supported tabular explainers such as SHAP and MACE. `TimeseriesExplainer` provides a unified easy-to-use interface for all the supported explainers. In practice, we recommend applying `TimeseriesExplainer` to generate explanations instead of using a specific explainer in the package `omnixai.explainers.timeseries`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# This default renderer is used for sphinx docs only. Please delete this cell in IPython.\n", "import plotly.io as pio\n", "pio.renderers.default = \"png\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import pandas as pd\n", "from omnixai.data.timeseries import Timeseries\n", "from omnixai.explainers.timeseries import TimeseriesExplainer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The time series data used here is a sythentic univariate time series dataset. We recommend using `Timeseries` to represent a time series dataset. `Timeseries` contains one univariate/multivariate time series, which can be constructed from a pandas dataframe (the index in the dataframe represents the timestamps and the columns are the variables)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " values\n", "timestamp \n", "1970-01-01 00:00:00 1.928031\n", "1970-01-01 00:05:00 -1.156620\n", "1970-01-01 00:10:00 -0.390650\n", "1970-01-01 00:15:00 0.400804\n", "1970-01-01 00:20:00 -0.874490\n", "... ...\n", "1970-02-04 16:55:00 0.362724\n", "1970-02-04 17:00:00 2.657373\n", "1970-02-04 17:05:00 1.472341\n", "1970-02-04 17:10:00 1.033154\n", "1970-02-04 17:15:00 2.950466\n", "\n", "[10000 rows x 1 columns]\n" ] } ], "source": [ "# Load the time series dataset\n", "df = pd.read_csv(os.path.join(\"./data\", \"timeseries.csv\"))\n", "df[\"timestamp\"] = pd.to_datetime(df[\"timestamp\"], unit='s')\n", "df = df.rename(columns={\"horizontal\": \"values\"})\n", "df = df.set_index(\"timestamp\")\n", "df = df.drop(columns=[\"anomaly\"])\n", "print(df)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Split the dataset into training and test splits\n", "train_df = df.iloc[:9150]\n", "test_df = df.iloc[9150:9300]\n", "# A simple threshold for detecting anomaly data points\n", "threshold = np.percentile(train_df[\"values\"].values, 90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The outputs of the detector are anomaly scores instead of anomaly labels (0 or 1). A test instance is more anomalous if it has a higher anomaly score. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# A simple detector for determining whether a window of time series is anomalous\n", "def detector(ts: Timeseries):\n", " anomaly_scores = np.sum((ts.values > threshold).astype(int))\n", " return anomaly_scores / ts.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To initialize `TimeseriesExplainer`, we need to set the following parameters:\n", "\n", " - `explainers`: The names of the explainers to apply, e.g., [\"shap\", \"mace\"].\n", " - `data`: The data used to initialize explainers. ``data`` is the training dataset for training the machine learning model.\n", " - `model`: The ML model to explain, e.g., a black-box anomaly detector.\n", " - `preprocess`: The preprocessing function converting the raw data (a `Timeseries` instance) into the inputs of `model`.\n", " - `postprocess` (optional): The postprocessing function transforming the outputs of ``model`` to a user-specific form, e.g., the anomaly labels.\n", " - `mode`: The task type, e.g., \"anomaly_detection\" or \"forecasting\".\n", " - `params`: Additional parameters for each explainer, e.g., MACE requires a threshold to determine anomaly labels." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b3f5ef4f83c64473a5fea35babdcb035", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00