Teacher-Student Multi-Agent Reinforcement Learning Framework for AutoML Pipeline Construction

ICLR 2026 Conference Submission22177 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning, Automated Machine Learning, Decentralized POMDP, Pedagogical Reinforcement Learning, Pipeline Optimization
TL;DR: We propose an teacher–student multi-agent RL framework for AutoML pipeline synthesis, where selective pedagogical interventions accelerate learning and outperform standard search methods with fewer evaluations
Abstract: We present an asymmetric teacher--student multi-agent reinforcement learning framework for automated machine learning (AutoML) pipeline synthesis. Unlike monolithic search methods (Bayesian optimization, evolutionary algorithms, single-agent RL), our formulation casts guided pipeline construction as a Dec-POMDP with selective interventions: a teacher proposes counterfactual improvements only when the estimated advantage exceeds an adaptive threshold, enabling accelerated early learning and graceful withdrawal. We approximate component-level credit using sparse ablations with historical reuse, improving interpretability and transfer readiness, and warm-start policies across datasets to reduce sample requirements. Empirically, our method matches or surpasses strong baselines (Random / Grid Search, H2O AutoML) while requiring fewer evaluations to reach accuracy targets, and produces emergent curriculum behavior where intervention rates decay from $\sim$40\% to less than 5\%. We emphasize architectural novelty and learning dynamics over exhaustive scaling, arguing that asymmetric pedagogical control provides a principled inductive bias for structured AutoML search.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 22177
Loading