Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Mitsuhiko Nakamoto; Yuexiang Zhai; Anikait Singh; Yi Ma; Chelsea Finn; Aviral Kumar; Sergey Levine

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

Published: 03 Mar 2023, Last Modified: 06 Jul 2025RRL 2023 SpotlightReaders: Everyone

Keywords: offline RL, online fine-tuning

TL;DR: we propose a method for efficient online RL fine-tuning

Abstract: A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing offline RL methods tend to exhibit poor fine-tuning performance. On the contrary, while naive online RL methods achieve compelling empirical performance, online methods suffer from a large sample complexity without a good policy initialization from the offline data. Our goal in this paper is to devise an approach for learning an effective offline initialization that also unlocks fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL) accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, meaning that the learned value estimation still upper-bounds the ground-truth value of some other reference policy (e.g., the behavior policy). Both theoretically and empirically, we show that imposing these conditions speeds up online fine-tuning, and brings in benefits of the offline data. In practice, Cal-QL can be implemented on top of existing offline RL methods without any extra hyperparameter tuning. Empirically, Cal-QL outperforms state-of-the-art methods on a wide range of fine-tuning tasks from both state and visual observations, across several benchmarks.

Track: Technical Paper

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/cal-ql-calibrated-offline-rl-pre-training-for/code)

2 Replies

Loading