Keywords: Adverse Drug Reaction, Large Language Model, Hybrid Retrieval, Structured Prediction, Unstructured Prediction
Abstract: Adverse drug reaction (ADR) prediction involves two complementary tasks: structured inference (classifies the occurrence of specific
ADR outcomes) and unstructured inference (generates narrative descriptions of those outcomes using preferred terminology). Existing
methods predominantly focus on structured ADR prediction, and use supervised learning based on hand-engineered features yielding
non-generalized detection, while unstructured ADR narrative prediction remains largely unexplored. In this work, we propose a novel
ADR prediction framework that jointly predicts structured and unstructured ADR outcomes. Our framework leverages large language model
(LLM) for semantic representation and ADR knowledge retrieval in a three-stage pipeline to simultaneously predict both structured
and unstructured outcomes in a generalized manner. First, we fine-tune an ADR-specific embedding model on top of a benchmark
foundation model to align the embedding space with domain-specific ADR terminology. Second, we construct a novel hybrid retrieval
pipeline that integrates BM25 lexical matching with dense vector similarity search to ensure high recall. Third, we apply a Maximal
Marginal Relevance (MMR) re-ranking strategy to balance relevance and diversity. Evaluation on three held-out FDA Adverse
Event Reporting System (FAERS) quarterly test sets demonstrates that our method achieves a two-fold improvement in top-1 classification
accuracy for structured ADR prediction, and a 32% improvement in recall for unstructured narrative descriptions compared with baselines.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: healthcare applications, adverse drug reaction, LLM/AI agents, prompting
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5496
Loading