Learning to Rank Expanations Demo

In this notebook, we will explore how to explain the scores of a Learning to Rank model using OmniXAI

Key Takeaways: - How to install and get started with ml4ir as a script - Explaining the rank scores using OmniXAI

The goal of Learning to Rank (LTR) is to come up with a ranking function to generate an optimal ordering of a list of documents. In this notebook, we will learn a simple pointwise ranking function using a listwise loss which will predict the ranking scores for all records of a given query. These scores can then be used at inference to determine the optimal ordering.

We explore the per-query Valid explanations using Omnixai’s ValidityRankingExplainer

Reference for algorithm: Singh, J., Khosla, M., & Anand, A. (2020). Valid Explanations for Learning to Rank Models. ArXiv, abs/2004.13972.

Install ml4ir and omnixai:

!pip install ml4ir -q!pip install omnixai -q

Installing visualization libraries:

!pip install --upgrade -q plotly nbformat

Look at the data:

[8]:

import pandas as pd

df_train = pd.read_csv("../ml4ir/applications/ranking/tests/data/csv/train/file_0.csv")
df_train.head(7)

[8]:

	query_id	query_text	rank	text_match_score	page_views_score	quality_score	clicked	domain_id	domain_name	name_match
0	query_2	MHS7A7RJB1Y4BJT	2	0.473730	0.000000	0.00000	0	2	domain_2	1
1	query_2	MHS7A7RJB1Y4BJT	1	1.063190	0.205381	0.30103	1	2	domain_2	1
2	query_5	KNJNWV	6	1.368108	0.030636	0.00000	0	0	domain_0	0
3	query_5	KNJNWV	3	1.370628	0.041261	0.30103	0	0	domain_0	0
4	query_5	KNJNWV	4	1.366700	0.082535	0.30103	0	0	domain_0	0
5	query_5	KNJNWV	1	1.333836	0.042572	0.30103	1	0	domain_0	0
6	query_5	KNJNWV	5	1.325021	0.046478	0.00000	0	0	domain_0	1

Define the FeatureConfig:

YAML File -> configs/activate_2020/feature_config.yaml

Feature	Type	TFRecord Type	Usage
query_text	Text	Context	Character Embeddings -> biLSTM Encoding
domain_name	Text	Context	VocabLookup -> Categorical Embedding
text_match_score	Numeric	Sequence	float
page_views_score	Numeric	Sequence	float
quality_score	Numeric	Sequence	float

Define the ModelConfig:

[2]:

print(open("configs/activate_2020/model_config.yaml").read())

architecture_key: dnn
layers:
  - type: dense
    name: first_dense
    units: 256
    activation: relu
  - type: dropout
    name: first_dropout
    rate: 0.3
  - type: dense
    name: second_dense
    units: 64
    activation: relu
  - type: dense
    name: final_dense
    units: 1
    activation: null

Using ml4ir as a script:

!python ../ml4ir/applications/ranking/pipeline.py \ --data_format csv \ --data_dir ../ml4ir/applications/ranking/tests/data/csv \ --feature_config configs/activate_2020/feature_config.yaml \ --model_config configs/activate_2020/model_config.yaml \ --execution_mode train_inference_evaluate \ --loss_key softmax_cross_entropy \ --num_epochs 3 \ --models_dir ../models/explain_demo_2022 \ --logs_dir ../logs/explain_demo_2022 \ --run_id activate_demo

Now, the model is saved and ready for inference.

[1]:

MODEL_DIR = '../models/explain_demo_2022/activate_demo'

[2]:

import logging
import tensorflow as tf
import os
from ml4ir.base.io.local_io import LocalIO
from ml4ir.base.io.file_io import FileIO
from ml4ir.base.features.feature_config import FeatureConfig, SequenceExampleFeatureConfig
from ml4ir.base.model.relevance_model import RelevanceModel
from ml4ir.base.config.keys import TFRecordTypeKey

[3]:

# Set up file I/O handler
file_io : FileIO = LocalIO()


# Set up logger
logger = logging.getLogger()

tf.get_logger().setLevel("INFO")
tf.autograph.set_verbosity(3)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

feature_config: SequenceExampleFeatureConfig = FeatureConfig.get_instance(
    tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
    feature_config_dict=file_io.read_yaml("configs/activate_2020/feature_config.yaml"),
    logger=logger)
print("Training features\n-----------------")
print("\n".join(feature_config.get_train_features(key="name")))

Training features
-----------------
text_match_score
page_views_score
quality_score
query_text
domain_name
text_match_score
page_views_score
quality_score
query_text
domain_name

Sanity check

[4]:

relevance_model = RelevanceModel(
    feature_config=feature_config,
    tfrecord_type=TFRecordTypeKey.EXAMPLE,
    model_file=os.path.join(MODEL_DIR, 'final/default/'),
    logger=logger,
    output_name="relevance_score",
    file_io=file_io
)

logger.info("Is Keras model? {}".format(isinstance(relevance_model.model, tf.keras.Model)))
logger.info("Is compiled? {}".format(relevance_model.is_compiled))

Retraining is not yet supported. Model is loaded with compile=False

[5]:

from tensorflow.keras import models as kmodels
from tensorflow import data

model = kmodels.load_model(
    os.path.join(MODEL_DIR, 'final/tfrecord/'),
    compile=False)
infer_fn = model.signatures["serving_tfrecord"]

[6]:

from ml4ir.base.data.tfrecord_helper import get_sequence_example_proto

def predict(features_df):
    features_df["query_text"] = features_df["query_text"].fillna("")
    features_df = (features_df.copy()
                              .rename(columns={
                                  feature["serving_info"]["name"]: feature["name"] for feature in
                                  feature_config.context_features + feature_config.sequence_features
                              }))
    #print(features_df)
    context_feature_names = [feature["name"] for feature in feature_config.context_features]
    protos = features_df.groupby(["query_id","query_text"]).apply(lambda g: get_sequence_example_proto(
            group=g,
            context_features=feature_config.context_features,
            sequence_features=feature_config.sequence_features,
        ))



    # Score the proto with the model
    ranking_scores = protos.apply(lambda se: infer_fn(
        tf.expand_dims(
            tf.constant(se.SerializeToString()),
            axis=-1))["ranking_score"].numpy()[0])
         # Check parity of scores
    predicted_scores = (ranking_scores.reset_index(name="ranking_score")
                        .set_index("query_id")
                        .squeeze())
    return predicted_scores["ranking_score"]

Let’s look at one of the queries:

[9]:

df_train[df_train["query_id"]=="query_5"]

[9]:

	query_id	query_text	rank	text_match_score	page_views_score	quality_score	clicked	domain_name	name_match
2	query_5	KNJNWV	6	1.368108	0.030636	0.00000	0	domain_0	0
3	query_5	KNJNWV	3	1.370628	0.041261	0.30103	0	domain_0	0
4	query_5	KNJNWV	4	1.366700	0.082535	0.30103	0	domain_0	0
5	query_5	KNJNWV	1	1.333836	0.042572	0.30103	1	domain_0	0
6	query_5	KNJNWV	5	1.325021	0.046478	0.00000	0	domain_0	1
7	query_5	KNJNWV	2	1.362720	0.042572	0.30103	0	domain_0	0

And its corresponding model output scores:

[10]:

predict(df_train[df_train["query_id"]=="query_5"])

/Users/tlaud/ml4ir/python/venv/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.

[10]:

array([0.11998416, 0.19389412, 0.20375773, 0.17943792, 0.11195529,
       0.1909707 ], dtype=float32)

Now, let’s create a Tabular instance which is a standard way to process datasets in OmniXAI:

[11]:

from omnixai.data.tabular import Tabular
training_data = Tabular(
   df_train,
   target_column='clicked',
)
training_data.to_pd() #The tabular instance can always be converted back to pandas DataFrame

[11]:

	query_id	query_text	rank	text_match_score	page_views_score	quality_score	clicked	domain_id	domain_name	name_match
0	query_2	MHS7A7RJB1Y4BJT	2	0.473730	0.000000	0.00000	0	2	domain_2	1
1	query_2	MHS7A7RJB1Y4BJT	1	1.063190	0.205381	0.30103	1	2	domain_2	1
2	query_5	KNJNWV	6	1.368108	0.030636	0.00000	0	0	domain_0	0
3	query_5	KNJNWV	3	1.370628	0.041261	0.30103	0	0	domain_0	0
4	query_5	KNJNWV	4	1.366700	0.082535	0.30103	0	0	domain_0	0
...	...	...	...	...	...	...	...	...	...	...
5671	query_1487	QCZ4XHLN	6	0.227694	0.000000	0.00000	0	2	domain_2	0
5672	query_1487	QCZ4XHLN	2	1.016954	0.000000	0.00000	0	2	domain_2	1
5673	query_1490	WYNFF89	2	0.474600	0.190735	0.00000	0	0	domain_0	0
5674	query_1490	WYNFF89	1	0.620355	0.143310	0.00000	1	0	domain_0	0
5675	query_1490	WYNFF89	3	0.508362	0.190735	0.00000	0	0	domain_0	1

5676 rows × 10 columns

Similarly for the query sample:

[12]:

sample_query = Tabular(
    df_train[df_train["query_id"]=="query_5"],
    target_column='clicked',
)
sample_query.to_pd()

[12]:

	query_id	query_text	rank	text_match_score	page_views_score	quality_score	clicked	domain_name	name_match
2	query_5	KNJNWV	6	1.368108	0.030636	0.00000	0	domain_0	0
3	query_5	KNJNWV	3	1.370628	0.041261	0.30103	0	domain_0	0
4	query_5	KNJNWV	4	1.366700	0.082535	0.30103	0	domain_0	0
5	query_5	KNJNWV	1	1.333836	0.042572	0.30103	1	domain_0	0
6	query_5	KNJNWV	5	1.325021	0.046478	0.00000	0	domain_0	1
7	query_5	KNJNWV	2	1.362720	0.042572	0.30103	0	domain_0	0

Define the features that you wish to analyze. These are sequence features in our case.

[18]:

sequence_features = [f['name'] for f in feature_config.sequence_features if f['trainable']]
columns = set(training_data.columns)
ignored_features = columns - set(sequence_features)

[19]:

ignored_features

[19]:

{'clicked',
 'domain_id',
 'domain_name',
 'name_match',
 'query_id',
 'query_text',
 'rank'}

Initialize Explainer:

[20]:

from omnixai.explainers.ranking.agnostic.validity import ValidityRankingExplainer

ranking_explainer = ValidityRankingExplainer(training_data=training_data,
                                             ignored_features=ignored_features,
                                             predict_function=lambda x: predict(x.to_pd()))

Get explanations in one call:

[21]:

explanation = ranking_explainer.explain(sample_query, # The tabular instance to be explained
                                        k=3 # The maximum number of features to consider as explanation
                                       )

The resulting order of feature importance:

[23]:

explanation.get_explanations(0)["top_features"].keys()

[23]:

dict_keys(['quality_score', 'text_match_score', 'page_views_score'])

We can determine the validity of our explanation

[25]:

explanation.get_explanations(0)['validity']['Tau']

[25]:

KendalltauResult(correlation=0.9999999999999999, pvalue=0.002777777777777778)

Kendall Tau of 0.99 indicates that the feature importances are a valid explanation for the ranking. We can also plot the features with importance grading:

[27]:

fig = explanation.ipython_figure()
fig.update_layout(autosize=False, width=1800)