Learning to Rank for Non Independent and Identically Distributed Datasets

Nicola Tonellotto; Raffaele Perego; Jacopo Cecchetti

Learning to Rank for Non Independent and Identically Distributed Datasets

Nicola Tonellotto, Raffaele Perego, Jacopo Cecchetti

Published: 07 Jun 2024, Last Modified: 07 Jun 2024ICTIR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learning to Rank, Non-IID, Distributed Search

TL;DR: We explore different methods for merging independently learned LTR models

Abstract: With the growing data privacy concerns, federated machine learning algorithms capable of preserving the confidentiality of sensitive information while enabling collaborative model training across decentralized data sources are attracting increasing interest. In this paper, we address the problem of collaboratively learning effective ranking models from non-independently and identically distributed (non-IID) training data owned by distinct search clients. We assume that the learning agents cannot access each other’s data, and that the models learned from local datasets might be biased or underperforming due to a skewed distribution of certain document features or query topics in the learning-to-rank training data. Thus, we aim to instill in the local ranking model learned from local data the knowledge from other models to obtain a more robust ranker capable of effectively handling documents and queries underrepresented in the local collection. To achieve this, we explore different methods for merging the ranking models, thus obtaining in each client a model that excels in ranking documents from the local data distribution but also performs well on queries retrieving documents having distributions typical of a partner’s node. In particular, our findings suggest that by relying on a linear combination of the local models, we can improve IR models effectiveness by up to +17.92% in NDCG@10 metric (moving from 0.619 to 0.730), and by up to +19.07% in MAP metric (moving from 0.713 to 0.850).

Submission Number: 16

Loading