JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

ACL ARR 2024 December Submission1132 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have shown significant promise in medical knowledge acquisition and question-answering. However, they can hallucinate and produce factually incorrect outcomes, even with domain-specific pretraining. Previous retrieval-augmented generation (RAG) approaches have had limited success in mitigating these hallucinations. We introduce JMLR (Jointly Trained LLM and Information Retrieval), which integrates the retriever within the LLM architecture during the fine-tuning phase. In this framework, LLM parameters are updated via cross-entropy loss, while retriever parameters are optimized using rank loss. This synchronized training enhances JMLR's capability to retrieve clinical guidelines and leverage medical knowledge for reasoning and answering questions, all while reducing computational demands. We evaluated JMLR on a critical medical question-answering application, demonstrating that JMLR-13B (70.5%) outperforms the previous state-of-the-art model, Meditron-70B (68.9%), and Llama2-13B with RAG (67.7%) on a medical question-answering dataset. Furthermore, in the USMLE factuality score assessed by GPT-4, JMLR showed a greater reduction in hallucinations (0.2463) compared to Claude3 Haiku (0.2337), Claude3 Opus (0.2356), and GPT-3.5 (0.2187). Comprehensive evaluations indicate that JMLR-13B improves reasoning quality and effectively reduces hallucinations. Additionally, JMLR-13B trains significantly faster (148 GPU hours) than Meditron-70B (42630 GPU hours). This work offers a novel and efficient knowledge enhancement method for healthcare, highlighting the potential of integrating retrieval and LLM training for medical question-answering systems.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Medical QA, Medical LLM
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 1132
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview