Utilize Pre-Trained PhoBERT to Compute Text Similarity and Rerank Documents for Question-Answering Task

Published: 01 Jan 2023, Last Modified: 18 May 2025ICCAIS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Open-domain Question Answering (QA) is a crucial task in natural language processing. QA systems typically follow two main steps: (i) identifying relevant passages and (ii) generating answer sentences from these passages. Among these steps, identifying relevant passages poses a greater challenge and requires further refinement. In this paper, we introduce two novel strategies to improve the performance of this step, including: (i) a new method for computing the similarity between questions and text passages, and (ii) the integration of pretrained and fine-tuned models. Empirical evaluations conducted on the Zalo 2022 dataset demonstrate the efficacy of our proposed methods, manifesting a notable 10% increase in recall compared to using the BM25 method alone, and a 6% increase in recall compared to relying solely on a fine-tuned cross-encoder model.
Loading