Maximum Reward Formulation In Reinforcement Learning

SaiKrishna Gottipati; Yashaswi Pathak; Rohan Nuttall; . Sahir; Raviteja Chunduru; Ahmed Touati; Sriram Ganapathi Subramanian; Matthew E. Taylor; Sarath Chandar

Maximum Reward Formulation In Reinforcement Learning

SaiKrishna Gottipati, Yashaswi Pathak, Rohan Nuttall, . Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, Sarath Chandar

28 Sept 2020 (modified: 12 Oct 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Theoretical Reinforcement Learning, Drug Discovery, Molecule Generation, de novo drug design

Abstract: Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not need to optimize for the expected cumulative return. In this work, we formulate an objective function to maximize the expected maximum reward along a trajectory, derive a novel functional form of the Bellman equation, introduce the corresponding Bellman operators, and provide a proof of convergence. Using this formulation, we achieve state-of-the-art results on the task of molecule generation that mimics a real-world drug discovery pipeline.

One-sentence Summary: Introduces a new functional form of bellman equation, provides convergence proof, and demonstrates state-of-the-art results on the task of molecule generation.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/maximum-reward-formulation-in-reinforcement/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=0zso3418FB

18 Replies

Loading