State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Xintong Duan; Yutong He; Fahim Tajwar; Wentse Chen; Russ Salakhutdinov; Jeff Schneider

State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Xintong Duan, Yutong He, Fahim Tajwar, Wentse Chen, Russ Salakhutdinov, Jeff Schneider

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: RL generalization, decision making, combinatorial generalization, diffusion model

TL;DR: zero-shot generalization in decision-making tasks to unseen states that are a different (but known) combination of basic objects encountered during training

Abstract: Many real-world decision-making problems are combinatorial in nature, where states (e.g., surrounding traffic of a self-driving car) can be seen as a combination of basic elements (e.g., pedestrians, trees, and other cars). Due to combinatorial complexity, observing all combinations of basic elements in the training set is infeasible, which leads to an essential yet understudied problem of $\textit{zero-shot generalization to states that are unseen combinations of previously seen elements.}$ In this work, we first formalize this problem and then demonstrate how existing value-based reinforcement learning (RL) algorithms struggle due to unreliable value predictions in unseen states. We argue that this problem cannot be addressed with exploration alone, but requires more expressive and generalizable models. We demonstrate that behavior cloning with a conditioned diffusion model trained on expert trajectory generalizes better to states formed by new combinations of seen elements than traditional RL methods. Through experiments in maze, driving, and multiagent environments, we show that conditioned diffusion models outperform traditional RL techniques and highlight the broad applicability of our problem formulation.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7568

Loading