Learning to Rank Expanations Demo
In this notebook, we will explore how to explain the scores of a Learning to Rank model using OmniXAI
Key Takeaways: - How to install and get started with ml4ir as a script - Explaining the rank scores using OmniXAI
The goal of Learning to Rank (LTR) is to come up with a ranking function to generate an optimal ordering of a list of documents. In this notebook, we will learn a simple pointwise ranking function using a listwise loss which will predict the ranking scores for all records of a given query. These scores can then be used at inference to determine the optimal ordering.
We explore the per-query Valid explanations using Omnixai’s ValidityRankingExplainer
Reference for algorithm: Singh, J., Khosla, M., & Anand, A. (2020). Valid Explanations for Learning to Rank Models. ArXiv, abs/2004.13972.
Install ml4ir and omnixai:
!pip install ml4ir -q!pip install omnixai -qInstalling visualization libraries:
!pip install --upgrade -q plotly nbformatLook at the data:
[8]:
import pandas as pd
df_train = pd.read_csv("../ml4ir/applications/ranking/tests/data/csv/train/file_0.csv")
df_train.head(7)
[8]:
query_id | query_text | rank | text_match_score | page_views_score | quality_score | clicked | domain_id | domain_name | name_match | |
---|---|---|---|---|---|---|---|---|---|---|
0 | query_2 | MHS7A7RJB1Y4BJT | 2 | 0.473730 | 0.000000 | 0.00000 | 0 | 2 | domain_2 | 1 |
1 | query_2 | MHS7A7RJB1Y4BJT | 1 | 1.063190 | 0.205381 | 0.30103 | 1 | 2 | domain_2 | 1 |
2 | query_5 | KNJNWV | 6 | 1.368108 | 0.030636 | 0.00000 | 0 | 0 | domain_0 | 0 |
3 | query_5 | KNJNWV | 3 | 1.370628 | 0.041261 | 0.30103 | 0 | 0 | domain_0 | 0 |
4 | query_5 | KNJNWV | 4 | 1.366700 | 0.082535 | 0.30103 | 0 | 0 | domain_0 | 0 |
5 | query_5 | KNJNWV | 1 | 1.333836 | 0.042572 | 0.30103 | 1 | 0 | domain_0 | 0 |
6 | query_5 | KNJNWV | 5 | 1.325021 | 0.046478 | 0.00000 | 0 | 0 | domain_0 | 1 |
Define the FeatureConfig:
YAML File -> configs/activate_2020/feature_config.yaml
Feature |
Type |
TFRecord Type |
Usage |
---|---|---|---|
query_text |
Text |
Context |
Character Embeddings -> biLSTM Encoding |
domain_name |
Text |
Context |
VocabLookup -> Categorical Embedding |
text_match_score |
Numeric |
Sequence |
float |
page_views_score |
Numeric |
Sequence |
float |
quality_score |
Numeric |
Sequence |
float |
Define the ModelConfig:
[2]:
print(open("configs/activate_2020/model_config.yaml").read())
architecture_key: dnn
layers:
- type: dense
name: first_dense
units: 256
activation: relu
- type: dropout
name: first_dropout
rate: 0.3
- type: dense
name: second_dense
units: 64
activation: relu
- type: dense
name: final_dense
units: 1
activation: null
Using ml4ir as a script:
!python ../ml4ir/applications/ranking/pipeline.py \ --data_format csv \ --data_dir ../ml4ir/applications/ranking/tests/data/csv \ --feature_config configs/activate_2020/feature_config.yaml \ --model_config configs/activate_2020/model_config.yaml \ --execution_mode train_inference_evaluate \ --loss_key softmax_cross_entropy \ --num_epochs 3 \ --models_dir ../models/explain_demo_2022 \ --logs_dir ../logs/explain_demo_2022 \ --run_id activate_demoNow, the model is saved and ready for inference.
[1]:
MODEL_DIR = '../models/explain_demo_2022/activate_demo'
[2]:
import logging
import tensorflow as tf
import os
from ml4ir.base.io.local_io import LocalIO
from ml4ir.base.io.file_io import FileIO
from ml4ir.base.features.feature_config import FeatureConfig, SequenceExampleFeatureConfig
from ml4ir.base.model.relevance_model import RelevanceModel
from ml4ir.base.config.keys import TFRecordTypeKey
[3]:
# Set up file I/O handler
file_io : FileIO = LocalIO()
# Set up logger
logger = logging.getLogger()
tf.get_logger().setLevel("INFO")
tf.autograph.set_verbosity(3)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
feature_config: SequenceExampleFeatureConfig = FeatureConfig.get_instance(
tfrecord_type=TFRecordTypeKey.SEQUENCE_EXAMPLE,
feature_config_dict=file_io.read_yaml("configs/activate_2020/feature_config.yaml"),
logger=logger)
print("Training features\n-----------------")
print("\n".join(feature_config.get_train_features(key="name")))
Training features
-----------------
text_match_score
page_views_score
quality_score
query_text
domain_name
text_match_score
page_views_score
quality_score
query_text
domain_name
Sanity check
[4]:
relevance_model = RelevanceModel(
feature_config=feature_config,
tfrecord_type=TFRecordTypeKey.EXAMPLE,
model_file=os.path.join(MODEL_DIR, 'final/default/'),
logger=logger,
output_name="relevance_score",
file_io=file_io
)
logger.info("Is Keras model? {}".format(isinstance(relevance_model.model, tf.keras.Model)))
logger.info("Is compiled? {}".format(relevance_model.is_compiled))
Retraining is not yet supported. Model is loaded with compile=False
[5]:
from tensorflow.keras import models as kmodels
from tensorflow import data
model = kmodels.load_model(
os.path.join(MODEL_DIR, 'final/tfrecord/'),
compile=False)
infer_fn = model.signatures["serving_tfrecord"]
[6]:
from ml4ir.base.data.tfrecord_helper import get_sequence_example_proto
def predict(features_df):
features_df["query_text"] = features_df["query_text"].fillna("")
features_df = (features_df.copy()
.rename(columns={
feature["serving_info"]["name"]: feature["name"] for feature in
feature_config.context_features + feature_config.sequence_features
}))
#print(features_df)
context_feature_names = [feature["name"] for feature in feature_config.context_features]
protos = features_df.groupby(["query_id","query_text"]).apply(lambda g: get_sequence_example_proto(
group=g,
context_features=feature_config.context_features,
sequence_features=feature_config.sequence_features,
))
# Score the proto with the model
ranking_scores = protos.apply(lambda se: infer_fn(
tf.expand_dims(
tf.constant(se.SerializeToString()),
axis=-1))["ranking_score"].numpy()[0])
# Check parity of scores
predicted_scores = (ranking_scores.reset_index(name="ranking_score")
.set_index("query_id")
.squeeze())
return predicted_scores["ranking_score"]
Let’s look at one of the queries:
[9]:
df_train[df_train["query_id"]=="query_5"]
[9]:
query_id | query_text | rank | text_match_score | page_views_score | quality_score | clicked | domain_id | domain_name | name_match | |
---|---|---|---|---|---|---|---|---|---|---|
2 | query_5 | KNJNWV | 6 | 1.368108 | 0.030636 | 0.00000 | 0 | 0 | domain_0 | 0 |
3 | query_5 | KNJNWV | 3 | 1.370628 | 0.041261 | 0.30103 | 0 | 0 | domain_0 | 0 |
4 | query_5 | KNJNWV | 4 | 1.366700 | 0.082535 | 0.30103 | 0 | 0 | domain_0 | 0 |
5 | query_5 | KNJNWV | 1 | 1.333836 | 0.042572 | 0.30103 | 1 | 0 | domain_0 | 0 |
6 | query_5 | KNJNWV | 5 | 1.325021 | 0.046478 | 0.00000 | 0 | 0 | domain_0 | 1 |
7 | query_5 | KNJNWV | 2 | 1.362720 | 0.042572 | 0.30103 | 0 | 0 | domain_0 | 0 |
And its corresponding model output scores:
[10]:
predict(df_train[df_train["query_id"]=="query_5"])
/Users/tlaud/ml4ir/python/venv/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
after removing the cwd from sys.path.
[10]:
array([0.11998416, 0.19389412, 0.20375773, 0.17943792, 0.11195529,
0.1909707 ], dtype=float32)
Now, let’s create a Tabular instance which is a standard way to process datasets in OmniXAI:
[11]:
from omnixai.data.tabular import Tabular
training_data = Tabular(
df_train,
target_column='clicked',
)
training_data.to_pd() #The tabular instance can always be converted back to pandas DataFrame
[11]:
query_id | query_text | rank | text_match_score | page_views_score | quality_score | clicked | domain_id | domain_name | name_match | |
---|---|---|---|---|---|---|---|---|---|---|
0 | query_2 | MHS7A7RJB1Y4BJT | 2 | 0.473730 | 0.000000 | 0.00000 | 0 | 2 | domain_2 | 1 |
1 | query_2 | MHS7A7RJB1Y4BJT | 1 | 1.063190 | 0.205381 | 0.30103 | 1 | 2 | domain_2 | 1 |
2 | query_5 | KNJNWV | 6 | 1.368108 | 0.030636 | 0.00000 | 0 | 0 | domain_0 | 0 |
3 | query_5 | KNJNWV | 3 | 1.370628 | 0.041261 | 0.30103 | 0 | 0 | domain_0 | 0 |
4 | query_5 | KNJNWV | 4 | 1.366700 | 0.082535 | 0.30103 | 0 | 0 | domain_0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5671 | query_1487 | QCZ4XHLN | 6 | 0.227694 | 0.000000 | 0.00000 | 0 | 2 | domain_2 | 0 |
5672 | query_1487 | QCZ4XHLN | 2 | 1.016954 | 0.000000 | 0.00000 | 0 | 2 | domain_2 | 1 |
5673 | query_1490 | WYNFF89 | 2 | 0.474600 | 0.190735 | 0.00000 | 0 | 0 | domain_0 | 0 |
5674 | query_1490 | WYNFF89 | 1 | 0.620355 | 0.143310 | 0.00000 | 1 | 0 | domain_0 | 0 |
5675 | query_1490 | WYNFF89 | 3 | 0.508362 | 0.190735 | 0.00000 | 0 | 0 | domain_0 | 1 |
5676 rows × 10 columns
Similarly for the query sample:
[12]:
sample_query = Tabular(
df_train[df_train["query_id"]=="query_5"],
target_column='clicked',
)
sample_query.to_pd()
[12]:
query_id | query_text | rank | text_match_score | page_views_score | quality_score | clicked | domain_id | domain_name | name_match | |
---|---|---|---|---|---|---|---|---|---|---|
2 | query_5 | KNJNWV | 6 | 1.368108 | 0.030636 | 0.00000 | 0 | 0 | domain_0 | 0 |
3 | query_5 | KNJNWV | 3 | 1.370628 | 0.041261 | 0.30103 | 0 | 0 | domain_0 | 0 |
4 | query_5 | KNJNWV | 4 | 1.366700 | 0.082535 | 0.30103 | 0 | 0 | domain_0 | 0 |
5 | query_5 | KNJNWV | 1 | 1.333836 | 0.042572 | 0.30103 | 1 | 0 | domain_0 | 0 |
6 | query_5 | KNJNWV | 5 | 1.325021 | 0.046478 | 0.00000 | 0 | 0 | domain_0 | 1 |
7 | query_5 | KNJNWV | 2 | 1.362720 | 0.042572 | 0.30103 | 0 | 0 | domain_0 | 0 |
Define the features that you wish to analyze. These are sequence features in our case.
[18]:
sequence_features = [f['name'] for f in feature_config.sequence_features if f['trainable']]
columns = set(training_data.columns)
ignored_features = columns - set(sequence_features)
[19]:
ignored_features
[19]:
{'clicked',
'domain_id',
'domain_name',
'name_match',
'query_id',
'query_text',
'rank'}
Initialize Explainer:
[20]:
from omnixai.explainers.ranking.agnostic.validity import ValidityRankingExplainer
ranking_explainer = ValidityRankingExplainer(training_data=training_data,
ignored_features=ignored_features,
predict_function=lambda x: predict(x.to_pd()))
Get explanations in one call:
[21]:
explanation = ranking_explainer.explain(sample_query, # The tabular instance to be explained
k=3 # The maximum number of features to consider as explanation
)
The resulting order of feature importance:
[23]:
explanation.get_explanations(0)["top_features"].keys()
[23]:
dict_keys(['quality_score', 'text_match_score', 'page_views_score'])
We can determine the validity of our explanation
[25]:
explanation.get_explanations(0)['validity']['Tau']
[25]:
KendalltauResult(correlation=0.9999999999999999, pvalue=0.002777777777777778)
Kendall Tau of 0.99 indicates that the feature importances are a valid explanation for the ranking. We can also plot the features with importance grading:
[27]:
fig = explanation.ipython_figure()
fig.update_layout(autosize=False, width=1800)