Reliable Active Apprenticeship Learning

Published: 18 Dec 2024, Last Modified: 14 Feb 2025ALT 2025EveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose a learning problem, which we call reliable active apprenticeship learning, for which we define a learning algorithm providing optimal performance guarantees, which we further show are sharply characterized by the eluder dimension of a policy class. In this setting, a learning algorithm is tasked with behaving optimally in an unknown environment given by a Markov decision process. The correct actions are specified by an unknown optimal policy in a given policy class. The learner initially does not know the optimal policy, but it has the ability to query an expert, which returns the optimal action for the current state. A learner is said to be reliable if, whenever it takes an action without querying the expert, its action is guaranteed to be optimal. We are then interested in designing a reliable learner which does not query the expert too often. We propose a reliable learning algorithm which provably makes the minimal possible number of queries, which we show is precisely characterized by the eluder dimension of the policy class. We further extend this to allow for imperfect experts, modeled as an oracle with noisy responses. We study two variants of this, inspired by noise conditions from classification: namely, Massart noise and Tsybakov noise. In both cases, we propose a reliable learning strategy which achieves a nearly-minimal number of queries, and prove upper and lower bounds on the optimal number of queries in terms of the noise conditions and the eluder dimension of the policy class.
PDF: pdf
Submission Number: 114
Loading