Integrated-gradient on IMDB dataset (Tensorflow)

This is an example of the integrated-gradient method on text classification with a Tensorflow model. If using this explainer, please cite the original work: https://github.com/ankurtaly/Integrated-Gradients.

[1]:

import unittest
import numpy as np
import pandas as pd
import tensorflow as tf
import sklearn
from sklearn.datasets import fetch_20newsgroups
from omnixai.data.text import Text
from omnixai.preprocessing.text import Word2Id
from omnixai.explainers.nlp.specific.ig import IntegratedGradientText

We apply a simple CNN model for this text classification task. Note that the method call has two inputs inputs (token ids) and masks (the sentence masks). For IntegratedGradientText, the first input of the model must be the token ids.

[2]:

class TextModel(tf.keras.Model):

    def __init__(self, num_embeddings, num_classes, **kwargs):
        super().__init__()
        self.num_embeddings = num_embeddings
        self.embedding_size = kwargs.get("embedding_size", 50)
        hidden_size = kwargs.get("hidden_size", 100)
        kernel_sizes = kwargs.get("kernel_sizes", [3, 4, 5])

        self.embedding = tf.keras.layers.Embedding(
            num_embeddings,
            self.embedding_size,
            embeddings_initializer=tf.keras.initializers.RandomUniform(minval=-0.1, maxval=0.1),
            name='embedding'
        )
        self.conv_layers = [
            tf.keras.layers.Conv1D(hidden_size, k, activation='relu', padding='same')
            for k in kernel_sizes
        ]
        self.dropout = tf.keras.layers.Dropout(0.2)
        self.output_layer = tf.keras.layers.Dense(num_classes)

    def call(self, inputs, masks, training=False):
        embeddings = self.embedding(inputs)
        x = embeddings * tf.expand_dims(masks, axis=-1)
        x = [tf.reduce_max(layer(x), axis=1) for layer in self.conv_layers]
        x = self.dropout(tf.concat(x, axis=1)) if training \
            else tf.concat(x, axis=1)
        outputs = self.output_layer(x)
        return outputs

We use a Text object to represent a batch of texts/sentences. The package omnixai.preprocessing.text provides some transforms related to text data such as Tfidf and Word2Id.

[3]:

# Load the training and test datasets
train_data = pd.read_csv('/home/ywz/data/imdb/labeledTrainData.tsv', sep='\t')
n = int(0.8 * len(train_data))
x_train = Text(train_data["review"].values[:n])
y_train = train_data["sentiment"].values[:n].astype(int)
x_test = Text(train_data["review"].values[n:])
y_test = train_data["sentiment"].values[n:].astype(int)
class_names = ["negative", "positive"]
# The transform for converting words/tokens to IDs
transform = Word2Id().fit(x_train)

The preprocessing function converts a batch of texts into token IDs and the masks. The outputs of the preprocessing function must fit the inputs of the model.

[4]:

max_length = 256

def preprocess(X: Text):
    samples = transform.transform(X)
    max_len = 0
    for i in range(len(samples)):
        max_len = max(max_len, len(samples[i]))
    max_len = min(max_len, max_length)
    inputs = np.zeros((len(samples), max_len), dtype=int)
    masks = np.zeros((len(samples), max_len), dtype=np.float32)
    for i in range(len(samples)):
        x = samples[i][:max_len]
        inputs[i, :len(x)] = x
        masks[i, :len(x)] = 1
    return inputs, masks

We now train the CNN model and evaluate its performance.

[5]:

learning_rate=1e-3
batch_size=128
num_epochs=10

model = TextModel(
    num_embeddings=transform.vocab_size,
    num_classes=len(class_names)
)
inputs, masks = preprocess(x_train)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
train_dataset = tf.data.Dataset.from_tensor_slices((inputs, masks, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

for epoch in range(num_epochs):
    for step, (ids, masks, labels) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(ids, masks, training=True)
            loss = loss_fn(labels, logits)
        grads = tape.gradient(loss, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        if step % 200 == 0:
            print(f"Training loss at epoch {epoch}, step {step}: {float(loss)}")

Training loss at epoch 0, step 0: 0.6866752505302429
Training loss at epoch 1, step 0: 0.4109169542789459
Training loss at epoch 2, step 0: 0.21237820386886597
Training loss at epoch 3, step 0: 0.1540527492761612
Training loss at epoch 4, step 0: 0.08126655220985413
Training loss at epoch 5, step 0: 0.02999718114733696
Training loss at epoch 6, step 0: 0.008433952927589417
Training loss at epoch 7, step 0: 0.009998280555009842
Training loss at epoch 8, step 0: 0.0030068857595324516
Training loss at epoch 9, step 0: 0.001554026734083891

[6]:

inputs, masks = preprocess(x_test)
outputs = model(inputs, masks).numpy()
predictions = np.argmax(outputs, axis=1)
print('Test accuracy: {}'.format(
    sklearn.metrics.f1_score(y_test, predictions, average='binary')))

Test accuracy: 0.8560798903465829

To initialize IntegratedGradientText, we need to set the following parameters:

model: The model to explain, whose type is tf.keras.Model or torch.nn.Module.
embedding_layer: The embedding layer in the model, which can be tf.keras.layers.Layer or torch.nn.Module.
preprocess_function: The pre-processing function that converts the raw input data into the inputs of model. The first output of preprocess_function should be the token ids.
mode: The task type, e.g., classification or regression.
id2token: The mapping from token ids to tokens.

[7]:

explainer = IntegratedGradientText(
    model=model,
    embedding_layer=model.embedding,
    preprocess_function=preprocess,
    id2token=transform.id_to_word
)
x = Text([
    "What a great movie! if you have no taste.",
    "it was a fantastic performance!",
    "best film ever",
    "such a great show!",
    "it was a horrible movie",
    "i've never watched something as bad"
])
explanations = explainer.explain(x)
explanations.ipython_plot(class_names=class_names)

Instance 0: Class positive

what a great movie if you have no taste

Instance 1: Class positive

it was a fantastic performance

Instance 2: Class positive

best film ever

Instance 3: Class positive

such a great show

Instance 4: Class negative

it was a horrible movie

Instance 5: Class negative

i never watched something as bad