Diversified Prior Knowledge Enhanced General Language Model for Biomedical Information Retrieval

Yizheng Huang, Jimmy X. Huang

Published: 2023, Last Modified: 22 Feb 2024ECAI 2023Readers: Everyone

Abstract: General language models have shown success in various information retrieval (IR) tasks, but their effectiveness is limited in the biomedical domain due to the specialized and complex nature of biomedical data. However, training domain-specific models is challenging and costly due to the limited availability of annotated data. To address these issues, we propose the Diversified Prior Knowledge Enhanced General Language Model (DPK-GLM) framework, which integrates domain knowledge with general language models for improved performance in biomedical IR. Our two-stage retrieval framework comprises a Knowledge-based Query Expansion method for enriching biomedical knowledge, an Aspect-based Filter for identifying highly-relevant documents, and a Diversity-based Score Reweighting method for re-ranking retrieved documents. Experimental results on public biomedical IR datasets show significant improvement, demonstrating the effectiveness of the proposed methods.

0 Replies