Learning to Rank Expanations Demo

In this notebook, we will explore how to explain the scores of a Learning to Rank model using OmniXAI

Key Takeaways: - How to install and get started with ml4ir as a script - Explaining the rank scores using OmniXAI

The goal of Learning to Rank (LTR) is to come up with a ranking function to generate an optimal ordering of a list of documents. In this notebook, we will learn a simple pointwise ranking function using a listwise loss which will predict the ranking scores for all records of a given query. These scores can then be used at inference to determine the optimal ordering.

We explore the per-query Valid explanations using Omnixai’s ValidityRankingExplainer

Reference for algorithm: Singh, J., Khosla, M., & Anand, A. (2020). Valid Explanations for Learning to Rank Models. ArXiv, abs/2004.13972.

Install ml4ir and omnixai:

!pip install ml4ir -q!pip install omnixai -q

Installing visualization libraries:

!pip install --upgrade -q plotly nbformat

Look at the data:

[8]:
import pandas as pd

df_train = pd.read_csv("../ml4ir/applications/ranking/tests/data/csv/train/file_0.csv")
df_train.head(7)
[8]:
query_id query_text rank text_match_score page_views_score quality_score clicked domain_id domain_name name_match
0 query_2 MHS7A7RJB1Y4BJT 2 0.473730 0.000000 0.00000 0 2 domain_2 1
1 query_2 MHS7A7RJB1Y4BJT 1 1.063190 0.205381 0.30103 1 2 domain_2 1
2 query_5 KNJNWV 6 1.368108 0.030636 0.00000 0 0 domain_0 0
3 query_5 KNJNWV 3 1.370628 0.041261 0.30103 0 0 domain_0 0
4 query_5 KNJNWV 4 1.366700 0.082535 0.30103 0 0 domain_0 0
5 query_5 KNJNWV 1 1.333836 0.042572 0.30103 1 0 domain_0 0
6 query_5 KNJNWV 5 1.325021 0.046478 0.00000 0 0 domain_0 1

Define the FeatureConfig:

YAML File -> configs/activate_2020/feature_config.yaml

Feature

Type

TFRecord Type

Usage

query_text

Text

Context

Character Embeddings -> biLSTM Encoding

domain_name

Text

Context

VocabLookup -> Categorical Embedding

text_match_score

Numeric

Sequence

float

page_views_score

Numeric

Sequence

float

quality_score

Numeric

Sequence

float

Define the ModelConfig:

[2]:
print(open("configs/activate_2020/model_config.yaml").read())
architecture_key: dnn
layers:
  - type: dense
    name: first_dense
    units: 256
    activation: relu
  - type: dropout
    name: first_dropout
    rate: 0.3
  - type: dense
    name: second_dense
    units: 64
    activation: relu
  - type: dense
    name: final_dense
    units: 1
    activation: null

Using ml4ir as a script:

!python ../ml4ir/applications/ranking/pipeline.py \ --data_format csv \ --data_dir ../ml4ir/applications/ranking/tests/data/csv \ --feature_config configs/activate_2020/feature_config.yaml \ --model_config configs/activate_2020/model_config.yaml \ --execution_mode train_inference_evaluate \ --loss_key softmax_cross_entropy \ --num_epochs 3 \ --models_dir ../models/explain_demo_2022 \ --logs_dir ../logs/explain_demo_2022 \ --run_id activate_demo

Now, the model is saved and ready for inference.

[1]:
MODEL_DIR = '../models/explain_demo_2022/activate_demo'
[2]:
import logging
import tensorflow as tf
import os
from ml4ir.base.io.local_io import LocalIO
from ml4ir.base.io.file_io import FileIO
from ml4ir.base.features.feature_config import FeatureConfig, SequenceExampleFeatureConfig
from ml4ir.base.model.relevance_model import RelevanceModel
from ml4ir.base.config.keys import TFRecordTypeKey
[3]:
# Set up file I/O handler
file_io : FileIO = LocalIO()


# Set up logger
logger = logging.getLogger()

tf.get_logger().setLevel("INFO")
tf.autograph.set_verbosity(3)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

feature_config: SequenceExampleFeatureConfig = FeatureConfig.get_instance(
    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
    feature_config_dict=file_io.read_yaml("configs/activate_2020/feature_config.yaml"),
    logger=logger)
print("Training features\n-----------------")
print("\n".join(feature_config.get_train_features(key="name")))
Training features
-----------------
text_match_score
page_views_score
quality_score
query_text
domain_name
text_match_score
page_views_score
quality_score
query_text
domain_name

Sanity check

[4]:
relevance_model = RelevanceModel(
    feature_config=feature_config,
    tfrecord_type=TFRecordTypeKey.EXAMPLE,
    model_file=os.path.join(MODEL_DIR, 'final/default/'),
    logger=logger,
    output_name="relevance_score",
    file_io=file_io
)

logger.info("Is Keras model? {}".format(isinstance(relevance_model.model, tf.keras.Model)))
logger.info("Is compiled? {}".format(relevance_model.is_compiled))
Retraining is not yet supported. Model is loaded with compile=False
[5]:
from tensorflow.keras import models as kmodels
from tensorflow import data

model = kmodels.load_model(
    os.path.join(MODEL_DIR, 'final/tfrecord/'),
    compile=False)
infer_fn = model.signatures["serving_tfrecord"]
[6]:
from ml4ir.base.data.tfrecord_helper import get_sequence_example_proto

def predict(features_df):
    features_df["query_text"] = features_df["query_text"].fillna("")
    features_df = (features_df.copy()
                              .rename(columns={
                                  feature["serving_info"]["name"]: feature["name"] for feature in
                                  feature_config.context_features + feature_config.sequence_features
                              }))
    #print(features_df)
    context_feature_names = [feature["name"] for feature in feature_config.context_features]
    protos = features_df.groupby(["query_id","query_text"]).apply(lambda g: get_sequence_example_proto(
            group=g,
            context_features=feature_config.context_features,
            sequence_features=feature_config.sequence_features,
        ))



    # Score the proto with the model
    ranking_scores = protos.apply(lambda se: infer_fn(
        tf.expand_dims(
            tf.constant(se.SerializeToString()),
            axis=-1))["ranking_score"].numpy()[0])
         # Check parity of scores
    predicted_scores = (ranking_scores.reset_index(name="ranking_score")
                        .set_index("query_id")
                        .squeeze())
    return predicted_scores["ranking_score"]

Let’s look at one of the queries:

[9]:
df_train[df_train["query_id"]=="query_5"]
[9]:
query_id query_text rank text_match_score page_views_score quality_score clicked domain_id domain_name name_match
2 query_5 KNJNWV 6 1.368108 0.030636 0.00000 0 0 domain_0 0
3 query_5 KNJNWV 3 1.370628 0.041261 0.30103 0 0 domain_0 0
4 query_5 KNJNWV 4 1.366700 0.082535 0.30103 0 0 domain_0 0
5 query_5 KNJNWV 1 1.333836 0.042572 0.30103 1 0 domain_0 0
6 query_5 KNJNWV 5 1.325021 0.046478 0.00000 0 0 domain_0 1
7 query_5 KNJNWV 2 1.362720 0.042572 0.30103 0 0 domain_0 0

And its corresponding model output scores:

[10]:
predict(df_train[df_train["query_id"]=="query_5"])
/Users/tlaud/ml4ir/python/venv/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
[10]:
array([0.11998416, 0.19389412, 0.20375773, 0.17943792, 0.11195529,
       0.1909707 ], dtype=float32)

Now, let’s create a Tabular instance which is a standard way to process datasets in OmniXAI:

[11]:
from omnixai.data.tabular import Tabular
training_data = Tabular(
   df_train,
   target_column='clicked',
)
training_data.to_pd() #The tabular instance can always be converted back to pandas DataFrame
[11]:
query_id query_text rank text_match_score page_views_score quality_score clicked domain_id domain_name name_match
0 query_2 MHS7A7RJB1Y4BJT 2 0.473730 0.000000 0.00000 0 2 domain_2 1
1 query_2 MHS7A7RJB1Y4BJT 1 1.063190 0.205381 0.30103 1 2 domain_2 1
2 query_5 KNJNWV 6 1.368108 0.030636 0.00000 0 0 domain_0 0
3 query_5 KNJNWV 3 1.370628 0.041261 0.30103 0 0 domain_0 0
4 query_5 KNJNWV 4 1.366700 0.082535 0.30103 0 0 domain_0 0
... ... ... ... ... ... ... ... ... ... ...
5671 query_1487 QCZ4XHLN 6 0.227694 0.000000 0.00000 0 2 domain_2 0
5672 query_1487 QCZ4XHLN 2 1.016954 0.000000 0.00000 0 2 domain_2 1
5673 query_1490 WYNFF89 2 0.474600 0.190735 0.00000 0 0 domain_0 0
5674 query_1490 WYNFF89 1 0.620355 0.143310 0.00000 1 0 domain_0 0
5675 query_1490 WYNFF89 3 0.508362 0.190735 0.00000 0 0 domain_0 1

5676 rows × 10 columns

Similarly for the query sample:

[12]:
sample_query = Tabular(
    df_train[df_train["query_id"]=="query_5"],
    target_column='clicked',
)
sample_query.to_pd()
[12]:
query_id query_text rank text_match_score page_views_score quality_score clicked domain_id domain_name name_match
2 query_5 KNJNWV 6 1.368108 0.030636 0.00000 0 0 domain_0 0
3 query_5 KNJNWV 3 1.370628 0.041261 0.30103 0 0 domain_0 0
4 query_5 KNJNWV 4 1.366700 0.082535 0.30103 0 0 domain_0 0
5 query_5 KNJNWV 1 1.333836 0.042572 0.30103 1 0 domain_0 0
6 query_5 KNJNWV 5 1.325021 0.046478 0.00000 0 0 domain_0 1
7 query_5 KNJNWV 2 1.362720 0.042572 0.30103 0 0 domain_0 0

Define the features that you wish to analyze. These are sequence features in our case.

[18]:
sequence_features = [f['name'] for f in feature_config.sequence_features if f['trainable']]
columns = set(training_data.columns)
ignored_features = columns - set(sequence_features)
[19]:
ignored_features
[19]:
{'clicked',
 'domain_id',
 'domain_name',
 'name_match',
 'query_id',
 'query_text',
 'rank'}

Initialize Explainer:

[20]:
from omnixai.explainers.ranking.agnostic.validity import ValidityRankingExplainer

ranking_explainer = ValidityRankingExplainer(training_data=training_data,
                                             ignored_features=ignored_features,
                                             predict_function=lambda x: predict(x.to_pd()))

Get explanations in one call:

[21]:
explanation = ranking_explainer.explain(sample_query, # The tabular instance to be explained
                                        k=3 # The maximum number of features to consider as explanation
                                       )

The resulting order of feature importance:

[23]:
explanation.get_explanations(0)["top_features"].keys()
[23]:
dict_keys(['quality_score', 'text_match_score', 'page_views_score'])

We can determine the validity of our explanation

[25]:
explanation.get_explanations(0)['validity']['Tau']
[25]:
KendalltauResult(correlation=0.9999999999999999, pvalue=0.002777777777777778)

Kendall Tau of 0.99 indicates that the feature importances are a valid explanation for the ranking. We can also plot the features with importance grading:

[27]:
fig = explanation.ipython_figure()
fig.update_layout(autosize=False, width=1800)