omnixai.explainers.tabular.agnostic package
The LIME explainer for tabular data. |
|
The SHAP explainer for tabular data. |
|
The partial dependence plots for tabular data. |
|
The accumulated local effects plots for tabular data. |
|
Morris sensitivity analysis for tabular data |
|
The L2X explainer for tabular data. |
|
The permutation feature importance explanation for tabular data. |
|
The model bias analyzer for tabular data. |
|
The explainer based ChatGPT. |
omnixai.explainers.tabular.agnostic.lime module
The LIME explainer for tabular data.
- class omnixai.explainers.tabular.agnostic.lime.LimeTabular(training_data, predict_function, mode='classification', ignored_features=None, **kwargs)
Bases:
TabularExplainer
The LIME explainer for tabular data. If using this explainer, please cite the original work: https://github.com/marcotcr/lime.
- Parameters
training_data (
Tabular
) – The data used to train local explainers in LIME.training_data
can be the training dataset for training the machine learning model. If the training dataset is large,training_data
can be its subset by applying omnixai.sampler.tabular.Sampler.subsample.predict_function (
Callable
) – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of thepredict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode (
str
) – The task type, e.g., classification or regression.ignored_features (
Optional
[List
]) – The features ignored in computing feature importance scores.kwargs – Additional parameters to initialize lime_tabular.LimeTabularExplainer, e.g.,
kernel_width
anddiscretizer
. Please refer to the doc of lime_tabular.LimeTabularExplainer.
- explanation_type = 'local'
- alias = ['lime']
- explain(X, y=None, **kwargs)
Generates the feature-importance explanations for the input instances.
- Parameters
X – A batch of input instances. When
X
is pd.DataFrame or np.ndarray,X
will be converted into Tabular automatically.y – A batch of labels to explain. For regression,
y
is ignored. For classification, the top predicted label of each instance will be explained wheny = None
.kwargs – Additional parameters used in LimeTabularExplainer.explain_instance, e.g.,
num_features
. Please refer to the doc of LimeTabularExplainer.explain_instance.
- Return type
- Returns
The feature-importance explanations for all the input instances.
- save(directory, filename=None, **kwargs)
Saves the initialized explainer.
- Parameters
directory (
str
) – The folder for the dumped explainer.filename (
Optional
[str
]) – The filename (the explainer class name if it is None).
omnixai.explainers.tabular.agnostic.shap module
The SHAP explainer for tabular data.
- class omnixai.explainers.tabular.agnostic.shap.ShapTabular(training_data, predict_function, mode='classification', ignored_features=None, **kwargs)
Bases:
TabularExplainer
The SHAP explainer for tabular data. If using this explainer, please cite the original work: https://github.com/slundberg/shap.
- Parameters
training_data (
Tabular
) – The data used to initialize a SHAP explainer.training_data
can be the training dataset for training the machine learning model. If the training dataset is large, please set parameternsamples
, e.g.,nsamples = 100
.predict_function (
Callable
) – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of thepredict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode (
str
) – The task type, e.g., classification or regression.ignored_features (
Optional
[List
]) – The features ignored in computing feature importance scores.kwargs – Additional parameters to initialize shap.KernelExplainer, e.g.,
nsamples
. Please refer to the doc of shap.KernelExplainer.
- explanation_type = 'local'
- alias = ['shap']
- explain(X, y=None, **kwargs)
Generates the local SHAP explanations for the input instances.
- Parameters
X – A batch of input instances. When
X
is pd.DataFrame or np.ndarray,X
will be converted into Tabular automatically.y – A batch of labels to explain. For regression,
y
is ignored. For classification, the top predicted label of each instance will be explained wheny = None
.kwargs – Additional parameters for shap.KernelExplainer.shap_values, e.g.,
nsamples
– the number of times to re-evaluate the model when explaining each prediction.
- Return type
- Returns
The feature importance explanations.
- save(directory, filename=None, **kwargs)
Saves the initialized explainer.
- Parameters
directory (
str
) – The folder for the dumped explainer.filename (
Optional
[str
]) – The filename (the explainer class name if it is None).
omnixai.explainers.tabular.agnostic.pdp module
The partial dependence plots for tabular data.
- class omnixai.explainers.tabular.agnostic.pdp.PartialDependenceTabular(training_data, predict_function, mode='classification', **kwargs)
Bases:
TabularExplainer
The partial dependence plots for tabular data. For more information, please refer to https://scikit-learn.org/stable/modules/partial_dependence.html.
- Parameters
training_data (
Tabular
) – The data used to initialize a PDP explainer.training_data
can be the training dataset for training the machine learning model. If the training dataset is large,training_data
can be its subset by applying omnixai.sampler.tabular.Sampler.subsample.predict_function – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of the
predict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode – The task type, e.g., classification or regression.
kwargs – Additional parameters, e.g.,
grid_resolution
– the number of candidates for each feature during generating partial dependence plots.
- explanation_type = 'global'
- alias = ['pdp', 'partial_dependence']
- explain(features=None, monte_carlo=False, monte_carlo_steps=10, monte_carlo_frac=0.1, **kwargs)
Generates global PDP explanations.
- Parameters
features (
Optional
[List
]) – The names of the features to be studied.monte_carlo (
bool
) – Whether computing PDP for Monte Carlo samples.monte_carlo_steps (
int
) – The number of Monte Carlo sampling steps.monte_carlo_frac (
float
) – The number of randomly selected samples in each Monte Carlo step.
- Return type
- Returns
The generated PDP explanations.
omnixai.explainers.tabular.agnostic.ale module
The accumulated local effects plots for tabular data.
- class omnixai.explainers.tabular.agnostic.ale.ALE(training_data, predict_function, mode='classification', **kwargs)
Bases:
TabularExplainer
The accumulated local effects (ALE) plots for tabular data. For more information, please refer to https://christophm.github.io/interpretable-ml-book/ale.html.
- Parameters
training_data (
Tabular
) – The data used to initialize the explainer.training_data
can be the training dataset for training the machine learning model. If the training dataset is large,training_data
can be its subset by applying omnixai.sampler.tabular.Sampler.subsample.predict_function – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of the
predict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode – The task type, e.g., classification or regression.
kwargs – Additional parameters, e.g.,
grid_resolution
– the number of candidates for each feature.
- explanation_type = 'global'
- alias = ['ale', 'accumulated_local_effects']
- static cmds(mat, k=1)
Classical multidimensional scaling. Please refer to: https://en.wikipedia.org/wiki/Multidimensional_scaling#Classical_multidimensional_scaling
- explain(features=None, monte_carlo=True, monte_carlo_steps=10, monte_carlo_frac=0.1, **kwargs)
Generates accumulated local effects (ALE) plots.
- Parameters
features (
Optional
[List
]) – The names of the features to be studied.monte_carlo (
bool
) – Whether computing ALE plots for Monte Carlo samples.monte_carlo_steps (
int
) – The number of Monte Carlo sampling steps.monte_carlo_frac (
float
) – The number of randomly selected samples in each Monte Carlo step.
- Return type
ALEExplanation
- Returns
The generated ALE explanations.
omnixai.explainers.tabular.agnostic.sensitivity module
Morris sensitivity analysis for tabular data
- class omnixai.explainers.tabular.agnostic.sensitivity.SensitivityAnalysisTabular(training_data, predict_function, **kwargs)
Bases:
TabularExplainer
Morris sensitivity analysis for tabular data based on the SALib. If using this explainer, please cite the package: https://github.com/SALib/SALib. This explainer only supports continuous-valued features.
- Parameters
training_data (
Tabular
) – The data used to initialize the explainer.training_data
can be the training dataset for training the machine learning model. If the training dataset is large,training_data
can be its subset by applying omnixai.sampler.tabular.Sampler.subsample.predict_function (
Callable
) – The prediction function corresponding to the model to explain. The outputs of thepredict_function
should be a batch of estimated values, e.g., class probabilities are not supported.
- explanation_type = 'global'
- alias = ['sa', 'sensitivity']
- explain(**kwargs)
Generates sensitivity analysis explanations.
- Parameters
kwargs – Additional parameters, e.g.,
nsamples
– the number of samples in Morris sampling.- Return type
- Returns
The generated global explanations.
omnixai.explainers.tabular.agnostic.L2X.l2x module
The L2X explainer for tabular data.
- class omnixai.explainers.tabular.agnostic.L2X.l2x.DefaultSelectionModel(explainer, **kwargs)
Bases:
_DefaultModelBase
The default selection model in L2X for tabular data. It is a simple feedforward neural network with three linear layers. The categorical features are mapped to embeddings.
- Parameters
explainer – A L2XTabular explainer.
kwargs – Additional parameters, e.g.,
hidden_size
– the hidden layer size.
- forward(inputs)
- Parameters
inputs – The model inputs.
- training: bool
- class omnixai.explainers.tabular.agnostic.L2X.l2x.DefaultPredictionModel(explainer, **kwargs)
Bases:
_DefaultModelBase
The default prediction model in L2X for tabular data. It is a simple feedforward neural network with three linear layers. The categorical features are mapped to embeddings.
- Parameters
explainer – A L2XTabular explainer.
kwargs – Additional parameters, e.g.,
hidden_size
– the hidden layer size.
- forward(inputs, weights)
- Parameters
inputs – The model inputs.
weights – The weights generated via Gumbel-Softmax sampling.
- training: bool
- class omnixai.explainers.tabular.agnostic.L2X.l2x.L2XTabular(training_data, predict_function, mode='classification', tau=0.5, k=8, selection_model=None, prediction_model=None, loss_function=None, optimizer=None, learning_rate=0.001, batch_size=None, num_epochs=10, **kwargs)
Bases:
TabularExplainer
The L2X explainer for tabular data. If using this explainer, please cite the original work: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation, Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan, https://arxiv.org/abs/1802.07814.
- Parameters
training_data (
Tabular
) – The data used to train the explainer.training_data
should be the training dataset for training the machine learning model.predict_function (
Callable
) – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of thepredict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode (
str
) – The task type, e.g., classification or regression.tau (
float
) – Parametertau
in Gumbel-Softmax.k (
int
) – The maximum number of the selected features in L2X.selection_model – A pytorch model class for estimating P(S|X) in L2X. If
selection_model = None
, a default model DefaultSelectionModel will be used.prediction_model – A pytorch model class for estimating Q(X_S) in L2X. If
prediction_model = None
, a default model DefaultPredictionModel will be used.loss_function (
Optional
[Callable
]) – The loss function for the task, e.g., nn.CrossEntropyLoss() for classification.optimizer – The optimizer class for training the L2X explainer, e.g., torch.optim.Adam.
learning_rate (
float
) – The learning rate for training the L2X explainer.batch_size (
Optional
[int
]) – The batch size for training the L2X explainer. Ifbatch_size
is None,batch_size
will be picked from [32, 64, 128, 256] based on the sample size.num_epochs (
int
) – The number of epochs for training the L2X explainer.kwargs – Additional parameters, e.g., parameters for
selection_model
andprediction_model
.
- explanation_type = 'local'
- alias = ['l2x', 'L2X']
- explain(X, **kwargs)
Generates the explanations corresponding to the input instances. For classification, it explains the top predicted label for each input instance.
- Parameters
X – A batch of input instances. When
X
is pd.DataFrame or np.ndarray,X
will be converted into Tabular automatically.kwargs – Not used here.
- Return type
- Returns
The feature-importance explanations for all the input instances.
- save(directory, filename=None, **kwargs)
Saves the initialized explainer.
- Parameters
directory (
str
) – The folder for the dumped explainer.filename (
Optional
[str
]) – The filename (the explainer class name if it is None).
omnixai.explainers.tabular.agnostic.permutation module
The permutation feature importance explanation for tabular data.
- class omnixai.explainers.tabular.agnostic.permutation.PermutationImportance(training_data, predict_function, mode='classification', **kwargs)
Bases:
ExplainerBase
,TabularExplainerMixin
The permutation feature importance explanations for tabular data. The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled.
- Parameters
training_data (
Tabular
) – The training dataset for training the machine learning model.predict_function – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of the
predict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode – The task type, e.g., classification or regression.
- explanation_type = 'global'
- alias = ['permutation']
- explain(X, y, n_repeats=30, score_func=None)
Generate permutation feature importance scores.
- Parameters
X (
Tabular
) – Data on which permutation importance will be computed.y (
Union
[ndarray
,DataFrame
]) – Targets or labels.n_repeats (
int
) – The number of times a feature is randomly shuffled.score_func (
Optional
[Callable
]) – The score function measuring the difference between ground-truth targets and predictions, e.g., -sklearn.metrics.log_loss(y_true, y_pred).
- Return type
- Returns
The permutation feature importance explanations.
omnixai.explainers.tabular.agnostic.shap_global module
The SHAP explainer for global feature importance.
- class omnixai.explainers.tabular.agnostic.shap_global.GlobalShapTabular(training_data, predict_function, mode='classification', ignored_features=None, **kwargs)
Bases:
TabularExplainer
The SHAP explainer for global feature importance. If using this explainer, please cite the original work: https://github.com/slundberg/shap.
- Parameters
training_data (
Tabular
) – The data used to initialize a SHAP explainer.training_data
can be the training dataset for training the machine learning model. If the training dataset is large, please set parameternsamples
, e.g.,nsamples = 100
.predict_function (
Callable
) – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of thepredict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode (
str
) – The task type, e.g., classification or regression.ignored_features (
Optional
[List
]) – The features ignored in computing feature importance scores.kwargs – Additional parameters to initialize shap.KernelExplainer, e.g.,
nsamples
. Please refer to the doc of shap.KernelExplainer.
- explanation_type = 'global'
- alias = ['shap_global']
- explain(X=None, **kwargs)
Generates the global SHAP explanations.
- Parameters
X (
Optional
[Tabular
]) – The data will be used to compute global SHAP values, i.e., the mean of the absolute SHAP value for each feature. If X is None, a set of training samples will be used.kwargs – Additional parameters for shap.KernelExplainer.shap_values, e.g.,
nsamples
– the number of times to re-evaluate the model when explaining each prediction.
- Returns
The global feature importance explanations.
- save(directory, filename=None, **kwargs)
Saves the initialized explainer.
- Parameters
directory (
str
) – The folder for the dumped explainer.filename (
Optional
[str
]) – The filename (the explainer class name if it is None).
omnixai.explainers.tabular.agnostic.bias module
The model bias analyzer for tabular data.
- class omnixai.explainers.tabular.agnostic.bias.BiasAnalyzer(training_data, predict_function, mode='classification', training_targets=None, **kwargs)
Bases:
ExplainerBase
The bias analysis for a classification or regression model.
- Parameters
training_data (
Tabular
) – The data used to initialize the explainer.predict_function – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of the
predict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.mode – The task type, e.g., classification or regression.
training_targets (
Optional
[List
]) – The training labels/targets. If it is None, the target column intraining_data
will be used. The values oftraining_targets
can only be integers (e.g., classification labels) or floats (regression targets).
- explanation_type = 'global'
- alias = ['bias']
- explain(feature_column, feature_value_or_threshold, label_value_or_threshold, **kwargs)
Runs bias analysis on the given model and dataset.
- Parameters
feature_column – The feature column to analyze.
feature_value_or_threshold – The feature value for a categorical feature or feature value threshold for a continuous-value feature. It can either be a single value or a list/tuple. When it is a single value, (a) for categorical features, the advantaged group will be those samples contains this feature value and the disadvantaged group will be the other samples, (b) for continuous-valued features, the advantaged group will be those samples whose values of feature_column <= feature_value_or_threshold and the disadvantaged group will be the other samples. When it is a list/tuple, (a) for categorical features, the advantaged group will be the samples contains the feature values in the first element in the list and the disadvantaged group will be the samples contains the feature values in the second element in the list. (b) for continuous-valued features, if feature_value_or_threshold is [a, b], then the advantaged group will be the samples whose values of feature_column <= a and the disadvantaged group will be the samples whose values of feature_column > b. If feature_value_or_threshold is [a, [b, c]], the disadvantaged group will be the samples whose values of feature_column is in (b, c].
label_value_or_threshold – The target label for classification or target threshold for regression. For regression, it will be converted into a binary classification problem when computing bias metrics, i.e., label = 0 if target value <= target_value_or_threshold, and label = 1 if target value > target_value_or_threshold.
- Return type
BiasExplanation
- Returns
The bias analysis results stored in
BiasExplanation
.
omnixai.explainers.tabular.agnostic.gpt module
The explainer based ChatGPT.
- class omnixai.explainers.tabular.agnostic.gpt.GPTExplainer(training_data, predict_function, apikey, mode='classification', ignored_features=None, include_counterfactual=True, openai_model='gpt-3.5-turbo', **kwargs)
Bases:
ExplainerBase
The explainer based on ChatGPT. The input prompt consists of the feature importance scores and the counterfactual examples (if used). The explanations will be the text generated by ChatGPT.
- Parameters
training_data (
Tabular
) – The data used to initialize a SHAP explainer.training_data
can be the training dataset for training the machine learning model.predict_function (
Callable
) – The prediction function corresponding to the model to explain. When the model is for classification, the outputs of thepredict_function
are the class probabilities. When the model is for regression, the outputs of thepredict_function
are the estimated values.apikey (
str
) – The OpenAI API Key.mode (
str
) – The task type, e.g., classification or regression.ignored_features (
Optional
[List
]) – The features ignored in computing feature importance scores.include_counterfactual (
bool
) – Whether to include counterfactual explanations in the results.openai_model (
str
) – The model type for chat completion.kwargs – Additional parameters to initialize shap.KernelExplainer, e.g.,
nsamples
. Please refer to the doc of shap.KernelExplainer.
- explanation_type = 'local'
- alias = ['gpt']