Morris sensitivity analysis

Morris sensitivity analysis for tabular data based on the SALib. If using this explainer, please cite the package: https://github.com/SALib/SALib. This explainer only supports continuous-valued features.

[1]:
# This default renderer is used for sphinx docs only. Please delete this cell in IPython.
import plotly.io as pio
pio.renderers.default = "png"
[2]:
import numpy as np
import pandas as pd
import sklearn
import sklearn.ensemble
from sklearn.datasets import load_boston

from omnixai.data.tabular import Tabular
from omnixai.preprocessing.base import Identity
from omnixai.preprocessing.tabular import TabularTransform
from omnixai.explainers.tabular import SensitivityAnalysisTabular

We recommend using Tabular to represent a tabular dataset, which can be constructed from a pandas dataframe or a numpy array. To create a Tabular instance given a pandas dataframe, one needs to specify the dataframe, the categorical feature names (if exists) and the target/label column name (if exists).

[3]:
boston = load_boston()
df = pd.DataFrame(
    np.concatenate([boston.data, boston.target.reshape((-1, 1))], axis=1),
    columns=list(boston.feature_names) + ['target'])
# Remove categorical features
df = df.drop(columns=[boston.feature_names[i] for i in [3, 8]])
tabular_data = Tabular(df, target_column='target')

We train a random forest model for this regression task.

[4]:
transformer = TabularTransform(
    target_transform=Identity()
).fit(tabular_data)
x = transformer.transform(tabular_data)

x_train, x_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(x[:, :-1], x[:, -1], train_size=0.80)
print('Training data shape: {}'.format(x_train.shape))
print('Test data shape:     {}'.format(x_test.shape))

rf = sklearn.ensemble.RandomForestRegressor(n_estimators=1000)
rf.fit(x_train, y_train)
print('Random Forest MSError', np.mean((rf.predict(x_test) - y_test) ** 2))
Training data shape: (404, 11)
Test data shape:     (102, 11)
Random Forest MSError 10.215751067843145

To initialize a sensitivity analyzer, we need to set:

  • training_data: The data used to initialize the explainer. training_data can be the training dataset for training the machine learning model. If the training dataset is too large, training_data can be a subset of it by applying omnixai.sampler.tabular.Sampler.subsample.

  • predict_function: The prediction function corresponding to the model.

[5]:
predict_function = lambda z: rf.predict(transformer.transform(z))
explainer = SensitivityAnalysisTabular(
    training_data=tabular_data,
    predict_function=predict_function,
)

SensitivityAnalysisTabular generates global explanations by calling explain. ipython_plot shows the generated explanations in IPython.

[6]:
explanations = explainer.explain()
explanations.ipython_plot()
../../_images/tutorials_tabular_sensitivity_11_0.png