Counterfactual explanation on time series anomaly detection

This is an example of our optimization-based counterfactual explainer MACEExplainer for time series data. This explainer only supports time series anomaly detection.

[1]:

# This default renderer is used for sphinx docs only. Please delete this cell in IPython.
import plotly.io as pio
pio.renderers.default = "png"

[2]:

import os
import numpy as np
import pandas as pd
from omnixai.data.timeseries import Timeseries
from omnixai.explainers.timeseries import MACEExplainer

The time series data used here is a sythentic univariate time series dataset. We recommend using Timeseries to represent a time series dataset. Timeseries contains one univariate/multivariate time series, which can be constructed from a pandas dataframe (the index in the dataframe indicates the timestamps and the columns are the variables).

[3]:

# Load the time series dataset
df = pd.read_csv(os.path.join("../data", "timeseries.csv"))
df["timestamp"] = pd.to_datetime(df["timestamp"], unit='s')
df = df.rename(columns={"horizontal": "values"})
df = df.set_index("timestamp")
df = df.drop(columns=["anomaly"])
print(df)

                       values
timestamp
1970-01-01 00:00:00  1.928031
1970-01-01 00:05:00 -1.156620
1970-01-01 00:10:00 -0.390650
1970-01-01 00:15:00  0.400804
1970-01-01 00:20:00 -0.874490
...                       ...
1970-02-04 16:55:00  0.362724
1970-02-04 17:00:00  2.657373
1970-02-04 17:05:00  1.472341
1970-02-04 17:10:00  1.033154
1970-02-04 17:15:00  2.950466

[10000 rows x 1 columns]

[4]:

# Split the dataset into training and test splits
train_df = df.iloc[:9150]
test_df = df.iloc[9150:9300]
# A simple threshold for detecting anomaly data points
threshold = np.percentile(train_df["values"].values, 90)

The outputs of the detector are anomaly scores instead of anomaly labels (0 or 1). A data point is more anomalous if it has a higher anomaly score.

[5]:

# A simple detector for determining whether a window of time series is anomalous
def detector(ts: Timeseries):
    anomaly_scores = np.sum((ts.values > threshold).astype(int))
    return anomaly_scores / ts.shape[0]

To initialize a MACEExplainer explainer, we need to set:

training_data: The data used to initialize a MACEExplainer explainer. training_data can be the training dataset for training the machine learning model.
predict_function: The prediction function corresponding to the model to explain. The input of predict_function should be an Timeseries instance. The outputs of predict_function are anomaly scores (higher scores imply more anomalous) for anomaly detection.
threshold: The threshold to determine whether an instance is anomalous, e.g., anomaly score > threshold.

[6]:

explainer = MACEExplainer(
    training_data=Timeseries.from_pd(train_df),
    predict_function=detector,
    mode="anomaly_detection",
    threshold=0.001
)
test_x = Timeseries.from_pd(test_df)

[7]:

explanations = explainer.explain(test_x)
explanations.ipython_plot()

 |███████████████████████████████████████-| 98.0%

../../_images/tutorials_timeseries_mace_11_1.png