Decision Mixer: Integrating Long-term and Local Dependencies via Dynamic Token Selection for Decision-Making

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: In this paper, we propose Decision Mixer (DM), which addresses the conflict between features of different scales in the modeling process from the perspective of dynamic integration.
Abstract: The Conditional Sequence Modeling (CSM) paradigm, benefiting from the transformer's powerful distribution modeling capabilities, has demonstrated considerable promise in offline Reinforcement Learning (RL) tasks. Depending on the task's nature, it is crucial to carefully balance the interplay between inherent local features and long-term dependencies in Markov decision trajectories to mitigate potential performance degradation and unnecessary computational overhead. In this paper, we propose Decision Mixer (DM), which addresses the conflict between features of different scales in the modeling process from the perspective of dynamic integration. Drawing inspiration from conditional computation, we design a plug-and-play dynamic token selection mechanism to ensure the model can effectively allocate attention to different features based on task characteristics. Additionally, we employ an auxiliary predictor to alleviate the short-sightedness issue in the autoregressive sampling process. DM achieves state-of-the-art performance on various standard RL benchmarks while requiring significantly fewer computational resources, offering a viable solution for building efficient and scalable RL foundation models. Code is available at here.
Lay Summary: Teaching AI to make good decisions from past experiences is challenging because it must balance immediate details ("what action to take now") with long-term consequences ("how this affects future outcomes"). Existing transformer-based methods struggle to efficiently handle both, leading to performance drops and high computational costs. We propose Decision Mixer (DM), which dynamically selects which parts of a decision sequence need deep analysis versus quick processing. Like a smart filter, DM identifies whether each step in a decision chain requires complex attention (for long-term planning) or simple forwarding (for local actions), dramatically reducing computation. We also add a helper module to prevent short-sighted decisions during step-by-step predictions. DM achieves state-of-the-art results across robotics and game-playing benchmarks while using significantly less computing power. This enables more efficient and scalable AI decision-makers for real-world applications like autonomous systems and industrial control.
Primary Area: Reinforcement Learning->Batch/Offline
Keywords: Offline RL, Conditional Sequence Modeling, Decision Transformer
Submission Number: 2093
Loading