Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

ICLR 2026 Conference Submission14421 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: pretraining, finetuning, reinforcement learning, posterior sampling

TL;DR: We develop an approach to pretrain policies that makes them more amenable finetuning.

Abstract: Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, typically with reinforcement learning (RL), in order to improve performance on deployment domains. This finetuning step has proved critical in achieving human or super-human performance, yet while much attention has been given to developing more effective finetuning algorithms, little attention has been given to ensuring the pretrained policy is an effective initialization for RL finetuning. In this work we seek to understand how the pretrained policy affects finetuning performance, and how to pretrain policies in order to ensure they are effective initializations for finetuning. We first show theoretically that, by training a policy to clone the demonstrator's \emph{posterior} distribution given the demonstration dataset---rather than simply the demonstrations themselves---we can obtain a policy that ensures coverage over the demonstrator's actions---a minimal condition for effective finetuning---without hurting the performance of the pretrained policy. Furthermore, we show that standard behavioral cloning (BC) pretraining fails to achieve this without significant tradeoffs in terms of sampling costs. Motivated by this, we then show that this approach is practically implementable with modern generative policies in robotic control domains, in particular diffusion policies, and leads to significantly improved finetuning performance on realistic robotic control benchmarks, as compared to standard behavioral cloning.

Primary Area: reinforcement learning

Submission Number: 14421

Loading