Shared Dynamic Model-Aligned Hypernetworks for Zero-Shot Generalization in Contextual Reinforcement Learning

Jan Benad; Frank Röder; Pradeep Kr. Banerjee; Nihat Ay; Martin V. Butz; Manfred Eppe

Shared Dynamic Model-Aligned Hypernetworks for Zero-Shot Generalization in Contextual Reinforcement Learning

Jan Benad, Frank Röder, Pradeep Kr. Banerjee, Nihat Ay, Martin V. Butz, Manfred Eppe

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: contextual reinforcement learning, zero-shot generalization, hypernetworks

TL;DR: We introduce DMA*-SH, a hypernetwork-based approach for contextual reinforcement learning that aligns context inference with dynamics models, enabling stable representations and achieving strong zero-shot generalization across diverse environments.

Abstract: Zero-shot generalization in contextual reinforcement learning (RL) remains a core challenge, particularly when explicit context information is unavailable and must be inferred from data. We propose DMA*-SH, a framework based on dynamics model-aligned (DMA) context inference, where a shared hypernetwork jointly parameterizes the dynamics model, policy, and action-value function. This design enforces consistency between learned context representations and transition dynamics, while normalization and random masking in the context encoder improve stability and robustness. To evaluate our method, we introduce the Actuator Inversion Benchmark (AIB), distinguishing overlapping from non-overlapping contexts, the latter generated via a discontinuous action sign flip that is provably unsolvable under standard domain randomization. We formalize the strict expressiveness advantage of DMA*-SH over concatenation-based approaches in non-overlapping settings, and show that the shared hypernetwork acts as an implicit regularizer steering RL gradients towards dynamically coherent solutions. Across the AIB benchmark, DMA*-SH delivers strong zero-shot generalization and outperforms both context-aware and context-unaware baselines, with the largest gains in non-overlapping contexts. Our results show hypernetworks enable effective and scalable context inference.

Primary Area: reinforcement learning

Submission Number: 24430

Loading