Turn-level Multiscale Density Ratio Estimation for LLM Agents

ACL ARR 2026 January Submission2556 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language Modeling, LLM/AI agents, continual learning
Abstract: With the rapid development of Large language model (LLM), agent systems enhanced by LLM show huge potential of being able to deal with complex tasks, especially involving multi-step thinking or interaction with tools. For applying LLM techniques with a well designed agent paradigm, post training of LLM on multiple agent scenarios is necessary to achieve better performance. Among the variable post training techniques, alignment method such as PPO, DPO, DIL and GRPO, become popular since many papers show the significant positive impact on model's performance by introducing the negative samples to be punished while keeping acceptable training complexity. However, most alignment methods address simple single-turn tasks, and there remains room for improvement for complex multi-turn tasks. We propose turn-level multiscale Density Ratio Estimation (tlm-DRE), which assigns different weights on corresponding turns and proposes asymmetric token level training based on the positive-negative space gaps across multiple turns of tasks. The results of the experiment on a wide range of agent benchmarks show that the proposed method performs competitively compared to traditional alignment methods. The proposed training method enables LLMs to perform robust in multi-turn reasoning tasks with both in-domain and out-domain conditions.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Language Modeling, LLM/AI agents, continual learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 2556
Loading