When Tokens Decay and Turns Amplify: A Dual-Granularity Framework for Multi-Turn Preference Optimization

Published: 03 Mar 2026, Last Modified: 21 Mar 2026SPOTEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Preference Optimization, Multi-Turn Dialogue, Large Language Models
Abstract: Multi-turn dialogue alignment faces critical challenges where tokens and turns contribute heterogeneously to preference signals. Existing methods apply uniform token weighting or binary turn selection, overlooking fine-grained structures. We present \textbf{T$^3$PO}, a dual-granularity framework incorporating: (i) token-level temporal discounting prioritizing early high-signal tokens with provable partition function cancellation; (ii) turn-level self-evaluated weighting via multi-perspective scoring, eliminating external dependencies. Experiments across multiple benchmarks and model scales demonstrate consistent improvements over baselines, with ablations confirming independent contributions from both mechanisms.
Submission Number: 93
Loading