Abstract: In the continuously evolving field of Natural Language Processing (NLP), we introduce a nuanced problem: Document-Level Dense Passage Retrieval (DL-DPR). The specialized task of extracting relevant passages from within individual, often complex, documents has not been adequately addressed, with prevalent dense retrieval methods primarily tailored for broader, corpus-level contexts. This identified gap, where the intricacies and specificities of single-document analysis are often overlooked, motivates our research. We propose a novel approach, embedding a contrastive fine-tuning method coupled with the augmentation of datasets through queries generated by Large Language Models (LLMs). This fusion of techniques is meticulously designed to fine-tune dense retrieval methods for the unique challenges presented by DL-DPR. Our approach, when subjected to rigorous evaluation on multiple benchmark datasets and metrics like top-k retrieval accuracy and MRR@10, exhibits a marked enhancement in performance. The findings not only validate our method but also underscore the untapped potentials of refining and adapting existing dense retrieval technologies for specialized tasks. This study, thus, serves as both an introduction and a significant contribution to this intricate sub-domain of NLP, promising enhanced precision and efficiency in information extraction from detailed and lengthy documents.
Paper Type: long
Research Area: Information Retrieval and Text Mining
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, Spanish, French, Korean, Arabic, Bengali
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading