Multi-Task Transformer Networks for Search Relevance Prediction and Ranking

Daria Soboleva, Alexander Boymel, Aleksandr Gotmanov, Max Ryabinin

22 Jan 2024OpenReview Archive Direct UploadReaders: Everyone

Abstract: Modern search systems rely on high-quality ranking models that order webpages according to the relevance of their content to the text of the query. It is often possible to leverage several datasets with varying data quality, size and target variables, enhancing the overall sys- tem with each model trained on its respective dataset. However, training a separate model for each task comes at the cost of high computational demands at inference time. We propose to view the ranking problem with several heterogeneous datasets in a multi-task setting and to train a single BERT model as a way to mitigate this issue. We show that with a combination of multi-task and distillation techniques, it is possible to replace multiple ranking models with a single model of the same size without any drops in quality and with single-task performance gains of 20–40%. In addition, we propose a new task reweighting approach, which is easy to implement and yields consistent gains when compared to base- lines. Finally, we demonstrate that the same method can be successfully applied to all 9 of GLUE tasks with similar conclusions.

0 Replies