MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs

Bingbing Wen; Faeze Brahman; Zhan Su; Shangbin Feng; Yulia Tsvetkov; Lucy Lu Wang; Bill Howe

MARVEL: Modular Abstention for Reliable and Versatile Expert LLMs

Bingbing Wen, Faeze Brahman, Zhan Su, Shangbin Feng, Yulia Tsvetkov, Lucy Lu Wang, Bill Howe

19 Sept 2025 (modified: 16 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: abstention, reliability, trustworthy large language model

Abstract: Effectively calibrating abstention—the capability of models to refuse to answer when inappropriate—remains a significant challenge for large language models (LLMs). Improper abstention calibration typically results in either excessive refusal, reducing the practical utility of the model, or insufficient refusal, which produces unreliable and potentially harmful outputs. Existing methods typically depend heavily on domain-specific fine-tuning, requiring extensive retraining or carefully crafted, domain-specific datasets for each new scenario, limiting scalability and efficiency. To address this, we introduce MARVEL, a lightweight modular abstention framework motivated by the observation that different tasks naturally require distinct abstention mechanisms and rationales. MARVEL dynamically integrates two distinct expert modules: Task Experts, which are specialized adapters finetuned for specific tasks, and Abstention Experts, trained explicitly to identify and articulate various abstention rationales (e.g., unsafe queries, ambiguous requests). Crucially, MARVEL achieves more reliable abstention performance without the need to retrain the original task-specific adapters. Our empirical evaluations cover two broad task categories: query-focused tasks, where abstention depends on query content alone, and model-capability tasks, where abstention is driven by model confidence. Results show that MARVEL consistently enhances abstention accuracy and other model reliability metrics with at least 8.1 points increase for in-domain and 5.4 points for out-of-domain scenarios over base LLMs. MARVEL surpasses strong baseline approaches like data merging and weight merging, offering greater flexibility, interpretability, and broader generalization.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 21834

Loading