In the accompanied video we demonstrate our hardware experiment. We evaluate our method SimFSVGD-Optimistic on a dynamic RC car. The goal of the agent is to learn a complex parking maneuver which involves loss of traction/drifting. We compare to SimFVGD [1].

SimFSVGD is a SOTA sample-efficient model-based RL algorithm which incorporates simulation priors into the training of Bayesian neural network dynamics. Thereby, the agent starts with an informed, albeit inaccurate, prior, and improves its model in an online fashion. This results in significant gains for sample-efficiency as shown in [1]. 

Following the experiments from [1], we use a simple bicycle model as the prior in our setup. 
As shown in the video, under the setting with sparse rewards, the SimFSVG agent, which greedily maximises the rewards with respect to its model distribution fails to do any meaningful exploration. This is because the reward is sparse and therefore, when planning through with its model posterior, the agent observes no reward signal and converges to the suboptimal solution of minimising the control costs, i.e., applying no actions. 

On the contrary, SimFSVGD-Optimistic explores in a much more principled manner, gradually improves its model, and learns to solve the parking task. This illustrates the advantages of principled exploration over greedy/naive exploration methods.


Note on the codebase: We ran our experiments (both simulation and hardware) using the open source implementations of the baseline algorithms. We will make our code available with the finalised manuscript or during the rebuttal period if requested.

Reference

[1] Rothfuss, Jonas, et al. "Bridging the Sim-to-Real Gap with Bayesian Inference." IROS (2024). 