A Self-supervised Joint Training Framework for Document RerankingDownload PDF


08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=7WH39SYtvI
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks. Recent works indicate that further pretraining the language models on the task-specific datasets before fine-tuning helps improve reranking performance. However, the pre-training tasks like masked language model and next sentence prediction were based on the context of documents instead of encouraging the model to understand the content of queries in document reranking task. In this paper, we propose a new self-supervised joint training framework (SJTF) with a self-supervised method called Masked Query Prediction (MQP) to establish semantic relations between given queries and positive documents. The framework randomly masks a token of query and encodes the masked query paired with positive documents, and uses a linear layer as a decoder to predict the masked token. In addition, the MQP is used to jointly optimize the models with supervised ranking objective during fine-tuning stage without an extra further pre-training stage. Extensive experiments on the MS MARCO passage ranking and TREC Robust datasets show that models trained with our framework obtain significant improvements compared to original models.
Copyright Consent Signature (type Name Or NA If Not Transferrable): Xiaozhi Zhu
Copyright Consent Name And Address: School of Computer Science,South China Normal University, China;55 Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong,China
Presentation Mode: This paper will be presented virtually
0 Replies
