Cultivating Divergent Multi-Objective Expertise in Multiagent Systems via Expert Ensembles

Published: 03 Jun 2026, Last Modified: 03 Jun 2026ALA 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-objective multiagent reinforcement learning, mixture of experts, continuous control
Abstract: Multiagent reinforcement learning (MARL) traditionally focuses on training autonomous agents to coordinate on a singular, well-defined goal. However, real-world applications often comprise multiple, often conflicting, objectives that require balancing a wide range of Pareto-optimal trade-offs. While recent multi-objective MARL (MOMARL) frameworks approximate this Pareto front by conditioning a multi-head actor network on a preference weight vector, this approach presents a key limitation. Burdening a monolithic actor network with representing both coordinated joint-policies and a continuum of topologically distinct objective trade-offs creates a severe representational challenge, leaving the network susceptible to destructive gradient interference. To mitigate this, we introduce Ensemble of Experts for Multi-Objective Multiagent Reinforcement Learning (E2M2). E2M2 replaces each agent's monolithic head with an ensemble of independent expert networks, governed by a preference-conditioned routing mechanism that outputs blending weights. This architecture allows individual experts to specialise in divergent behaviours, while the router learns to smoothly interpolate between them to consolidate the agent's action. Preliminary experiments in a continuous multi-objective MAMuJoCo task demonstrate that E2M2 discovers a more dominant and diverse Pareto front, on average achieving a 22\% improvement in hypervolume compared to a parameter-matched baseline.
Journal Edition Interest: Yes
Submission Number: 61
Loading