Keywords: Offline model-based planning
TL;DR: We propose SHARP-BC, a plug-and-play offline planning method that replaces fixed horizons with uncertainty-weighted aggregation and uses simple BC as an action proposer, enabling strong test-time adaptation without online tuning.
Abstract: Offline reinforcement learning enables policies to be learned from previously collected experiences without requiring online interaction. However, these policies are typically deployed as fixed, zero-shot agents and lack the ability to adapt their behavior at test time. Offline model-based planning offers a promising way to enable flexible test-time adaptation, but its performance is highly sensitive to critical design choices, particularly the planning horizon and the action proposer. In practice, these choices are often tuned through online evaluation, contradicting the premise of offline RL. In this work, we introduce Soft Horizon AggRegation for Planning (SHARP), an offline plug-and-play planning method that eliminates the need for an online-tuned planning horizon. Instead of using a fixed horizon across all states, SHARP performs soft horizon aggregation by dynamically weighting returns according to model uncertainty estimated from an ensemble of dynamics models. We further investigate the role of the action proposer and find that stronger offline policies do not necessarily lead to better planning performance. Instead, a simple behavior cloning (BC) policy is often sufficient as an action proposer while avoiding the effort required for extensive policy extraction. Combining these insights, we propose SHARP-BC, which consistently outperforms existing baselines while reducing reliance on extensive online hyperparameter tuning.
Submission Number: 117
Loading