Active Fine-Tuning of Multi-Task Policies

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We actively query demonstrations when fine-tuning a multi-task policy through behavioral cloning.
Abstract: Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide *which tasks should be demonstrated and how often?* We study this multi-task problem and explore an interactive framework in which the agent *adaptively* selects the tasks to be demonstrated. We propose AMF (Active Multi-task Fine-tuning), an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.
Lay Summary: Current machine learning methods for robotic control can learn to perform a variety of tasks, especially when additional data demonstrating these tasks is collected. As data collection can be costly, we design an algorithm that decides how much data should be collected for each task to be learned. Intuitively, our algorithm focuses on tasks that the learning agent can learn the most about. We show that this algorithm can be mathematically proven to perform well in a simplified setting, and apply it to several simulated robot arms, which need to learn to move and utilize objects around them.
Link To Code: https://github.com/marbaga/amf
Primary Area: Reinforcement Learning
Keywords: imitation learning, deep reinforcement learning, multi-task reinforcement learning, active learning, fine-tuning
Submission Number: 10183
Loading