Swarm Behavior Cloning

Published: 01 Jan 2025, Last Modified: 15 May 2025ICAART (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In sequential decision-making environments, the primary approaches for training agents are Reinforcement Learning (RL) and Imitation Learning (IL). Unlike RL, which relies on modeling a reward function, IL leverages expert demonstrations, where an expert policy πe (e.g., a human) provides the desired behavior. Formally, a dataset D of state-action pairs is provided: D = (s,a = πe(s)). A common technique within IL is Behavior Cloning (BC), where a policy π(s) = a is learned through supervised learning on D. Further improvements can be achieved by using an ensemble of N individually trained BC policies, denoted as E = {πi(s)}1≤i≤N. The ensemble’s action a for a given state s is the aggregated output of the N actions: a = 1 N ∑i πi(s). This paper addresses the issue of increasing action differences—the observation that discrepancies between the N predicted actions grow in states that are underrepresented in the training data. Large action differences can result in suboptimal aggregated
Loading