Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation

Published: 20 May 2026, Last Modified: 20 May 2026ICRA 2026 Workshop SDRLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Learning, Differentiable Simulation, Continual Learning, Aerial Robots
TL;DR: We introduce an online, real-world policy learning framework based on residual dynamics learning and differentiable simulation that enables rapid online adaptation of control policies to unknown, varying real-world dynamics.
Abstract: Learning control policies in simulation enables fast, safe, and cost-effective development, but transferring them to the real world remains challenging due to the sim-to-real gap. Existing methods such as domain randomization and Real2Sim2Real improve robustness but either fail under out-of-distribution conditions or require costly retraining. In this work, we instead focus on rapid online adaptation. We propose a framework that unifies residual dynamics learning with real-time policy adaptation in a differentiable simulation. Starting from a simple dynamics model, the system continuously refines dynamics using real-world data to capture unmodeled effects such as payload changes and wind, and uses the refined dynamics to perform gradient-based, sample-efficient policy updates which are beyond reach for classical RL methods like PPO. Designed for speed, our approach adapts to unseen disturbances within 5 seconds of training. We validate it on agile quadrotor control in simulation and the real world, achieving up to 81% lower hovering error than L1-MPC and 55% lower than DATT, while also demonstrating robustness in vision-based control without explicit state estimation.
Submission Number: 2
Loading