$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

ACL ARR 2024 June Submission703 Authors

12 Jun 2024 (modified: 29 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent studies show the growing significance of document retrieval in the generation of LLMs within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 22.6% and 10.4% on nDCG@5 with unsupervised and supervised retrievers, respectively, averaged on queries containing multiple subqueries from four scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$ to boost the application of LLMs in the scientific domain.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval, dense retrieval
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 703
Loading