SHAP for time series anomaly detection

The SHAP explainer for time series data supports time series anomaly detection and forecasting. If using this explainer, please cite the original work: https://github.com/slundberg/shap.

[1]:
# This default renderer is used for sphinx docs only. Please delete this cell in IPython.
import plotly.io as pio
pio.renderers.default = "png"
[2]:
import os
import numpy as np
import pandas as pd
from omnixai.data.timeseries import Timeseries
from omnixai.explainers.timeseries import ShapTimeseries

The time series data used here is a sythentic univariate time series dataset. We recommend using Timeseries to represent a time series dataset. Timeseries contains one univariate/multivariate time series, which can be constructed from a pandas dataframe (the index in the dataframe indicates the timestamps and the columns are the variables).

[3]:
# Load the time series dataset
df = pd.read_csv(os.path.join("../data", "timeseries.csv"))
df["timestamp"] = pd.to_datetime(df["timestamp"], unit='s')
df = df.rename(columns={"horizontal": "values"})
df = df.set_index("timestamp")
df = df.drop(columns=["anomaly"])
print(df)
                       values
timestamp
1970-01-01 00:00:00  1.928031
1970-01-01 00:05:00 -1.156620
1970-01-01 00:10:00 -0.390650
1970-01-01 00:15:00  0.400804
1970-01-01 00:20:00 -0.874490
...                       ...
1970-02-04 16:55:00  0.362724
1970-02-04 17:00:00  2.657373
1970-02-04 17:05:00  1.472341
1970-02-04 17:10:00  1.033154
1970-02-04 17:15:00  2.950466

[10000 rows x 1 columns]
[4]:
# Split the dataset into training and test splits
train_df = df.iloc[:9150]
test_df = df.iloc[9150:9300]
# A simple threshold for detecting anomaly data points
threshold = np.percentile(train_df["values"].values, 90)

The outputs of the detector are anomaly scores instead of anomaly labels (0 or 1). A data point is more anomalous if it has a higher anomaly score.

[5]:
# A simple detector for determining whether a window of time series is anomalous
def detector(ts: Timeseries):
    anomaly_scores = np.sum((ts.values > threshold).astype(int))
    return anomaly_scores / ts.shape[0]

To initialize a SHAP explainer, we need to set:

  • training_data: The data used to initialize a SHAP explainer. training_data can be the training dataset for training the machine learning model.

  • predict_function: The prediction function corresponding to the model to explain. The input of predict_function should be an Timeseries instance. The outputs of predict_function are anomaly scores (higher scores imply more anomalous) for anomaly detection or predicted values for forecasting.

  • mode: The task type, e.g., “anomaly_detection” or “forecasting”.

[6]:
explainer = ShapTimeseries(
    training_data=Timeseries.from_pd(train_df),
    predict_function=detector,
    mode="anomaly_detection"
)
test_x = Timeseries.from_pd(test_df)

SHAP generates local explanations, e.g. explainer.explain is called given the test instances. ipython_plot plots the generated explanations in IPython. Parameter index indicates which instance in test_x to plot, e.g., index = 0 means plotting the first instance in test_x.

[7]:
explanations = explainer.explain(test_x)
explanations.ipython_plot()
../../_images/tutorials_timeseries_shap_12_1.png