Less is More: an Attention-free Sequence Prediction Modeling for Offline Embodied Learning

Wei Huang; Jianshu Zhang; Leiyu Wang; Heyue Li; Luoyi Fan; Yichen Zhu; Nanyang Ye; Qinying Gu

Less is More: an Attention-free Sequence Prediction Modeling for Offline Embodied Learning

Wei Huang, Jianshu Zhang, Leiyu Wang, Heyue Li, Luoyi Fan, Yichen Zhu, Nanyang Ye, Qinying Gu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Offline Reinforcement Learning, Embodied Learning

Abstract: Offline reinforcement learning (offline RL) is increasingly approached as a sequence modeling task, with methods leveraging advanced architectures like Transformers to capture trajectory dependencies. Despite significant progress, the mechanisms underlying their effectiveness and limitations remain insufficiently understood. We conduct a thorough analysis on the representative Decision Transformer (DT) model using an entropy analysis and identify the inconsistencies in state-action-reward ($\langle s, a, R \rangle$) distributions causing attention ``dispersal". To address this, we propose a hierarchical framework that decomposes sequence modeling into intra-step relational modeling—handled by a Token Merger that fuses each $\langle s, a, R \rangle$ triplet—and inter-step modeling—handled by a Token Mixer across timesteps. We investigate several Token Merger designs and validate their effectiveness across various offline RL methods. Furthermore, our theoretical analysis and experimental results suggest that while Token Mixers are important, lightweight architecture can also achieve even better performance to more complex ones. We therefore propose a parameter-free Average Pooling Token Mixer, which, combined with a convolutional Token Merger, forms our final model, Decision HiFormer (DHi). DHi achieves a \textbf{73.6\%} improvement in inference speed and an \textbf{9.3\%} gain in policy performance on the D4RL benchmark compared to DT. DHi also generalizes well to real-world robotic manipulation tasks, offering both practical benefits and insights into sequence-based policy design for offline RL. Code and models are public at \href{https://wei-nijuan.github.io/DecisionHiFormer/}{project page}.

Supplementary Material: zip

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 14110

Loading