ECHO: Efficient Coarse-Grained Hybrid Optimization — Clip at Batch, Learn at Token

Rui Hu; Junjie Guo; Haoliang Cao; Hao Liu; Baohua Dong; Hangcheng Zhu; Ruohui Huang; Gang Yu

ECHO: Efficient Coarse-Grained Hybrid Optimization — Clip at Batch, Learn at Token

Rui Hu, Junjie Guo, Haoliang Cao, Hao Liu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu

18 Sept 2025 (modified: 17 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Large Language Models, Policy Optimization, Importance Sampling

TL;DR: ECHO introduces coarse-grained optimization that stabilizes RL training in LLMs while achieving stronger final performance on reasoning tasks.

Abstract: Reinforcement learning (RL) for large language models (LLMs) typically employs token-level clipping of importance sampling ratios to ensure training stability. While effective at preventing catastrophic policy shifts, such fine-grained clipping often excessively truncates learning signals, limiting optimization efficiency. To address this limitation, we propose ECHO, a novel RL method that combines batch-level clipping with token-level importance sampling. Specifically, ECHO computes an average importance sampling ratio across the entire batch and uses this single clipping bound to modulate the gradient of each token. This batch-level approach preserves richer global reward information while retaining fine-grained token attribution, enabling gradients to capture more holistic reward structures and improve sample efficiency, leading to faster convergence and more stable training. Our method also provides a new perspective on how to define importance sampling ratios and reward shaping in RL for LLMs. Experimental results on both in-domain Math and reasoning benchmarks demonstrate that ECHO not only accelerates convergence but also achieves highly competitive performance, highlighting its efficiency and robustness for large-scale LLM alignment.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 10653

Loading