On the Complexity of Teaching a Family of Linear Behavior Cloning Learners

Shubham Kumar Bharti; Stephen Wright; Adish Singla; Jerry Zhu

On the Complexity of Teaching a Family of Linear Behavior Cloning Learners

Shubham Kumar Bharti, Stephen Wright, Adish Singla, Jerry Zhu

Published: 25 Sept 2024, Last Modified: 13 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Teaching, Behavior Cloning, Reinforcement Learning, Supervised Learning

TL;DR: We study optimal teaching complexity of a family of consistent linear behavior cloning learners.

Abstract: We study optimal teaching for a family of Behavior Cloning learners that learn using a linear hypothesis class. In this setup, a knowledgeable teacher can demonstrate a dataset of state and action tuples and is required to teach an optimal policy to an entire family of BC learners using the smallest possible dataset. We analyze the linear family and design a novel teaching algorithm called `TIE' that achieves the instance optimal Teaching Dimension for the entire family. However, we show that this problem is NP-hard for action spaces with $|\mathcal{A}| > 2$ and provide an efficient approximation algorithm with a $\log(|\mathcal{A}| - 1)$ guarantee on the optimal teaching size. We present empirical results to demonstrate the effectiveness of our algorithm and compare it to various baselines in different teaching environments.

Supplementary Material: zip

Primary Area: Reinforcement learning

Submission Number: 13796

Loading