{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Morris sensitivity analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Morris sensitivity analysis for tabular data based on the SALib. If using this explainer, please cite the package: https://github.com/SALib/SALib. This explainer only supports continuous-valued features." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# This default renderer is used for sphinx docs only. Please delete this cell in IPython.\n", "import plotly.io as pio\n", "pio.renderers.default = \"png\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import sklearn\n", "import sklearn.ensemble\n", "from sklearn.datasets import load_boston\n", "\n", "from omnixai.data.tabular import Tabular\n", "from omnixai.preprocessing.base import Identity\n", "from omnixai.preprocessing.tabular import TabularTransform\n", "from omnixai.explainers.tabular import SensitivityAnalysisTabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recommend using `Tabular` to represent a tabular dataset, which can be constructed from a pandas dataframe or a numpy array. To create a `Tabular` instance given a pandas dataframe, one needs to specify the dataframe, the categorical feature names (if exists) and the target/label column name (if exists)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "boston = load_boston()\n", "df = pd.DataFrame(\n", " np.concatenate([boston.data, boston.target.reshape((-1, 1))], axis=1),\n", " columns=list(boston.feature_names) + ['target'])\n", "# Remove categorical features\n", "df = df.drop(columns=[boston.feature_names[i] for i in [3, 8]])\n", "tabular_data = Tabular(df, target_column='target')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We train a random forest model for this regression task." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training data shape: (404, 11)\n", "Test data shape: (102, 11)\n", "Random Forest MSError 10.215751067843145\n" ] } ], "source": [ "transformer = TabularTransform(\n", " target_transform=Identity()\n", ").fit(tabular_data)\n", "x = transformer.transform(tabular_data)\n", "\n", "x_train, x_test, y_train, y_test = \\\n", " sklearn.model_selection.train_test_split(x[:, :-1], x[:, -1], train_size=0.80)\n", "print('Training data shape: {}'.format(x_train.shape))\n", "print('Test data shape: {}'.format(x_test.shape))\n", "\n", "rf = sklearn.ensemble.RandomForestRegressor(n_estimators=1000)\n", "rf.fit(x_train, y_train)\n", "print('Random Forest MSError', np.mean((rf.predict(x_test) - y_test) ** 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To initialize a sensitivity analyzer, we need to set:\n", " \n", " - `training_data`: The data used to initialize the explainer. ``training_data`` can be the training dataset for training the machine learning model. If the training dataset is too large, ``training_data`` can be a subset of it by applying `omnixai.sampler.tabular.Sampler.subsample`.\n", " - `predict_function`: The prediction function corresponding to the model." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "predict_function = lambda z: rf.predict(transformer.transform(z))\n", "explainer = SensitivityAnalysisTabular(\n", " training_data=tabular_data,\n", " predict_function=predict_function,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`SensitivityAnalysisTabular` generates global explanations by calling `explain`. `ipython_plot` shows the generated explanations in IPython." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "explanations = explainer.explain()\n", "explanations.ipython_plot()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }