Towards Deployment-Efficient Reinforcement Learning: Lower Bound and OptimalityDownload PDF

Anonymous

Sep 29, 2021 (edited Oct 05, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: reinforcement learning theory, deployment efficiency, linear MDP
  • Abstract: Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an ''optimization with constraints'' perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give ``Safe DE-RL'' and ``Sample-Efficient DE-RL'' as two examples, which may be worth future investigation.
  • One-sentence Summary: We propose a formal theoretical formulation for depolyment-efficient reinforcement learning; establish lower bounds for deployment complexity and study near-optimal deployment-efficient algorithms in linear MDP setting.
  • Supplementary Material: zip
0 Replies

Loading