Keywords: reward models, preference modeling, long-horizon evaluation, CBT counseling, benchmark
Abstract: Reward models (RMs) are widely used to align large language models (LLMs), yet their reliability in long-horizon conversational settings remains poorly understood. In cognitive behavioral therapy (CBT)-based counseling, preference judgments depend on session-level coherence, long-term consistency, and therapeutic process fidelity, posing challenges beyond short-context evaluation. We introduce \textbf{PRMB}, a benchmark for evaluating reward models in long-horizon, multi-session CBT-based counseling. PRMB is constructed from a combination of real-world and simulated counseling cases using a progressive summarization framework, and comprises over 15k pairwise and Best-of-N preference instances. Evaluating both discriminative and LLM-as-a-judge reward models, we find that state-of-the-art RMs exhibit low accuracy, session-wise degradation, and systematic over-empathizing biases. Moreover, PRMB rankings positively correlate with downstream Best-of-N inference performance across multiple policy models. PRMB provides a foundation for reward modeling in process-oriented conversational domains.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking,corpus creation,evaluation
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English,Chinese
Submission Number: 9803
Loading