Keywords: Chain of Thought/Reasoning models, Understanding high-level properties of models, Reinforcement learning
TL;DR: We introduce ReLIF, a training framework that makes reasoning models more interpretable, faithful, and reliable
Abstract: Recent advances in long chain-of-thought (CoT) reasoning have largely prioritized answer accuracy and token efficiency, while overlooking aspects critical to user experience. We argue that trustworthy reasoning is essential for usability, and that it should satisfy three key properties: interpretability, faithfulness, and reliability. To this end, we propose $\texttt{ReLIF}$, a training framework that integrates supervised fine-tuning with GRPO to encourage models to: (i) produce structured, tag-based traces with high-level planning that are easier for humans to follow; (ii) explicitly disclose the decisive information guiding each solution, with consistent cross-section references; and (iii) provide self-assessments of both the derivation’s soundness and the confidence of the final answer. We train $\texttt{ReLIF}$ at multiple scales (1.7B/4B/8B) and evaluate across mathematical benchmarks of varying difficulty. Results show that $\texttt{ReLIF}$ generates clearer and better-structured reasoning traces, more faithfully exposes its underlying decision process, and offers informative confidence estimates. These findings highlight an overlooked but important direction: reasoning models should be evaluated not only on accuracy, but also on broader dimensions of trustworthiness that directly shape user experience.
Submission Number: 193
Loading