Metric-Guided Instance Re-weighting for Reliable Explainability

Yibo Huang, Zixin Kuang, Meng-Fen Chiang, Wang-Chien Lee

Published: 15 Nov 2025, Last Modified: 13 Nov 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Abstract—Good rationale quality from large language models (LLMs) is essential for reliability and interpretability. However, the rationales produced by existing LLMs still have shortcomings, such as the lack of informativeness and faithfulness, which affect their practical applications. Reproduction experiments for enhancing rationale generation present significant challenges due to several factors. To study rationale quality in an agnostic manner, we develop a novel framework, the Metric-guided Rationale Enhancement Framework (MREF), that re-weighs training instances based on multiple aspects of rationale quality. Specifically, MREF fine-tunes an LLM at hand on two benchmark multiple-choice question (MCQ) datasets, ECQA and MedMCQA, to generate answers and rationales. In the fine-tuning process, it exploits metrics from ROSCOE to evaluate the produced rationales across five dimensions: faithfulness, informativeness, coherence, repetition, and grammar, and uses these metric scores to guide re-weighting of training instances, hence encouraging the LLM to emphasize rationales of higher quality. Comprehensive experimental results demonstrate that this metric-guided re-weighting strategy significantly improves rationale quality across all evaluated ROSCOE metrics over the baselines without re-weighting, leading to more reliable and understandable outputs. MREF can be seamlessly integrated with existing LLMs for various NLP tasks beyond MCQs. Our code and datasets will be made available upon acceptance.