Descriptive Adverse Drug Reaction Prediction via Hybrid Retrieval and Low-λ MMR Re-ranking

Descriptive Adverse Drug Reaction Prediction via Hybrid Retrieval and Low-λ MMR Re-ranking

ACL ARR 2026 January Submission5496 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adverse Drug Reaction, Large Language Model, Hybrid Retrieval, Structured Prediction, Unstructured Prediction

Abstract: Adverse drug reaction (ADR) prediction involves two complementary tasks: structured inference (classifies the occurrence of specific ADR outcomes) and unstructured inference (generates narrative descriptions of those outcomes using preferred terminology). Existing methods predominantly focus on structured ADR prediction, and use supervised learning based on hand-engineered features yielding non-generalized detection, while unstructured ADR narrative prediction remains largely unexplored. In this work, we propose a novel ADR prediction framework that jointly predicts structured and unstructured ADR outcomes. Our framework leverages large language model (LLM) for semantic representation and ADR knowledge retrieval in a three-stage pipeline to simultaneously predict both structured and unstructured outcomes in a generalized manner. First, we fine-tune an ADR-specific embedding model on top of a benchmark foundation model to align the embedding space with domain-specific ADR terminology. Second, we construct a novel hybrid retrieval pipeline that integrates BM25 lexical matching with dense vector similarity search to ensure high recall. Third, we apply a Maximal Marginal Relevance (MMR) re-ranking strategy to balance relevance and diversity. Evaluation on three held-out FDA Adverse Event Reporting System (FAERS) quarterly test sets demonstrates that our method achieves a two-fold improvement in top-1 classification accuracy for structured ADR prediction, and a 32% improvement in recall for unstructured narrative descriptions compared with baselines.

Paper Type: Long

Research Area: Clinical and Biomedical Applications

Research Area Keywords: healthcare applications, adverse drug reaction, LLM/AI agents, prompting

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5496

Loading