Safe Reinforcement Learning with Natural Language Constraints

Tsung-Yen Yang; Michael Hu; Yinlam Chow; Peter Ramadge; Karthik R Narasimhan

Safe Reinforcement Learning with Natural Language Constraints

Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter Ramadge, Karthik R Narasimhan

28 Sept 2020 (modified: 12 Oct 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Safe reinforcement learning, Language grounding

Abstract: In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average. Dataset and code to reproduce our experiments are available at https://sites.google.com/view/polco-hazard-world/.

One-sentence Summary: We tackle the problem of learning control policies for tasks when provided with constraints in natural language (e.g., safety or budget constraints).

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/safe-reinforcement-learning-with-natural/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=IUJCTG0i9

14 Replies

Loading