Hypothesis Driven Coordinate Ascent for Reinforcement LearningDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Reinforcement Learning, Black-Box Optimization, Hypothesis Testing, Coordinate Ascent, Block Coordinate Ascent, Random Search, MDP
Abstract: This work develops a novel black box optimization technique for learning robust policies for stochastic environments. Through combining coordinate ascent with hypothesis testing, Hypothesis Driven Coordinate Ascent (HDCA) optimizes without computing or estimating gradients. The simplicity of this approach allows it to excel in a distributed setting; its implementation provides an interesting alternative to many state-of-the-art methods for common reinforcement learning environments. HDCA was evaluated on various problems from the MuJoCo physics simulator and OpenAI Gym framework, achieving equivalent or superior results to standard RL benchmarks.
One-sentence Summary: This work develops a novel black box optimization technique that combines coordinate ascent with hypothesis testing to learn robust policies for stochastic environments.
Supplementary Material: zip
12 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview