Introduction

In pairwise ranking, the model learns the relative order of an item while considering other items in the search results. For example: In a given search query, if there are 10 results, but only 2 of them got a conversion event (clicks, reply, like, subscribe, comment etc), so the model will learn patterns to rank higher the two conversion items given the rest of the results.

In this tutorial, we’ll learn to use XGBoost to build pairwise ranking models. Internally, xgboost uses a combination of pairwise and pointwise loss to train ranking models.

Pointwise vs Pairwise Loss

To understand how pairwise loss is calculated, lets first look at a simpler loss function: pointwise loss. In pointwise model, each sample is considered independent of each other. To measure pointwise loss, we use squared loss as shown:

\[f(y, s) = \sum_{i=1}^{n}(y_{i} – s_{i})^{2}\]

where y<sub>i</sub> refers to the actual relevance label, s<sub>i</sub> refers to the predicted score and n refers to all the items in the dataset. In case of pairwise, instead of absolute prediction, the loss function considers the relevance of a pair as shown:

\[f(y, s)=\sum_{y_{i}>y_{j}}^{}\log_{2}\left( 1 + e^-{\Theta}^{\left( s_{i}-s_{j} \right)} \right)\]

where y<sub>i</sub>, y<sub>j</sub> refers to the actual relevance label, s<sub>i</sub>, s<sub>j</sub> refers to the predicted score of the items, Θ is a hyper-parameter function. As you have already guessed:

  1. When y<sub>i</sub> > y<sub>j</sub> and s<sub>i</sub> - s<sub>j</sub> > 0, the model incurs a smaller loss.
  2. When y<sub>i</sub> > y<sub>j</sub> and s<sub>i</sub> - s<sub>j</sub> < 0, the model incurs a greater loss.

the above equation is nothing but the logistic loss with a slight modification where pairwise score difference is taken instead of the classic weighted equation ( b<sub>o </sub>+ b<sub>1</sub>x + ... ).

What are these pairs though? Lets understand it using an example.

Creating pairs with Lambdamart

Lambdamart is a tree based version (builds ensemble of weak trees) of Lambdarank. Basically, lambdarank introduces a rank based metric i.e ΔNDCG to the loss function to generate the optimised ranking of items. It takes into account the difference in NDCG generated by swapping the items within a query.

With NDCG, the loss function becomes:

\[f(y, s)=\sum_{y_{i}>y_{j}}^{}\log_{2}\left( 1 + e^-{\Theta}^{\left( s_{i}-s_{j} \right)} \right)*\left|\Delta\text{NDCG}_{ij}\right|\]

where ΔNDCG is calculated by swapping the items within a given query.

To explain how the algorithm works, I’m going to use a data set which looks as follows:

labelitem_nameitem_idpositionquery_idfeature_0feature_1feature_2
0adidas shoes323100.0340.1200.109
0adidas x2876200.1070.0010.612
1nike pegasus988300.9870.5430.766
1protein shaker632110.1020.2310.221

The algorithm works like this:

  1. For each query_id
  2. Calculate the ideal DCG (this is calculated based on sorting the label column. Highest relevance comes first and so on ..)
  3. Swap each item in the query and calculate the change in DCG (i.e. new DCG after swapping minus the ideal DCG). This change will be negative or positive depending on, if the swap leads to improvement or not.
  4. Calculate the average gain in DCG for each item, after making all the swaps.
  5. The average gain calculated is used as the label in the model.

Fortunately, all this heavy lifting happens internally inside XGBoost. Lets train a model using the data shown above. I’ve installed xgboost using: pip install xgboost==2.0.1 . The latest version also contain logic to handle position bias i.e. clicks happening on lower position are more relevant than higher positions, hence we should weight them higher.

import pandas as pd
import xgboost as xgb
from sklearn import metrics

features = ['feature_0', 'feature_1', 'feature_2']

ranker = xgb.XGBRanker(
        tree_method="hist",
        objective="rank:ndcg",
        lambdarank_unbiased=True,
        lambdarank_num_pair_per_sample=10,
        eval_metric=["ndcg@1", "ndcg@8"],
        lambdarank_bias_norm=2,
        n_estimators=1000)

ranker.fit(
    traindf[features],
    traindf['label'],
    qid=traindf['query_qid'],
    eval_set=[(traindf[features], traindf['label']),(testdf[features], testdf['user_feedback'])],
    eval_qid=[traindf['query_qid'], testdf['query_qid']],
    verbose=50, 
    early_stopping_rounds=30)

testdf['pred'] = ranker.predict(testdf[features])

 In the above code:

  1. traindf and testdf are two dataframes containing data as shown above.
  2. objective="rank:ndcg": since ndcg is not differentiable, xgboost uses lambdamart to directly optimise ndcg.
  3. lambdarank_unbiased : handles positions bias
  4. lambdarank_num_pair_per_sample : creates the pairs for each sample. For example: if a query has 3 items, and we set lambdarank_num_pair_per_sample=2, then for that query, we’ll end up with 6 pairs. Tuning this parameter affects the model performance.
  5. qid: refers to the column which contains the integer encoded query_id.

Once we have generated the predictions, we can evaluate the model using NDCG@k metrics. In our example dataset, we’ll use k=3:

score = testdf.groupby('qid').apply(lambda x: metrics.ndcg_score([x['label']], [x['weighted_pred']], k=3)).mean()

print(score)
0.413

NDCG metric is commonly used for ranking tasks, since it appreciates correct predictions at lower rank than at higher rank. For example: an item correctly ranked at position 1 will boost NDCG score higher than an item correctly predicted at rank 10. The metric discounts the gain by position of an item.

Summary

In this article, we discussed the basics of pairwise ranking and lambamart. XGBoost provides an out of box implementation of lambdamart algorithm while handling position bias. You can read more about XGBoost learning to rank from its official documentation.

By Manish Saraswat

Specialized in building scalable microservices and intelligent data products powered by Generative AI, large language models (LLMs), recommendation systems, sponsored ads modeling and search & ranking models.

💡 I’m passionate about demystifying AI and empowering others through knowledge-sharing. Whether you're just starting out or deep in the space, I love helping people discover something new.

📬 Let’s connect on 📇 LinkedIn—I’d love to chat, collaborate, or exchange ideas.

Subscribe
Notify of
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Massa

Thanks for the insights. I find it quite helpful.

Shanu Singhal

Hi Manish ! Thanks for the explanation but I have a doubt. Let say I am having user-product purchase pair data where user features are constant for each product bought. Since you are labelling the purchase as 1 or 0. Then how it is learning on the same features for different labels for a single query_id(user_id in my case).

3
0
Would love your thoughts, share with us.x
()
x