Online Laplacian-Based Representation Learning in Reinforcement Learning

Maheed H. Ahmed; Jayanth Bhargav; Mahsa Ghasemi

Online Laplacian-Based Representation Learning in Reinforcement Learning

Maheed H. Ahmed, Jayanth Bhargav, Mahsa Ghasemi

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Representation learning, Online Learning, Graph Laplacian

TL;DR: We propose an online method for learning the Laplacian representation in reinforcement learning, and show theoretically and empirically it converges.

Abstract: Representation learning plays a crucial role in reinforcement learning, especially in complex environments with high-dimensional and unstructured states. Effective representations can enhance the efficiency of learning algorithms by improving sample efficiency and generalization across tasks. This paper considers the Laplacian-based framework for representation learning, where the eigenvectors of the Laplacian matrix of the underlying transition graph are leveraged to encode meaningful features from raw sensory observations of the states. Despite the promising algorithmic advances in this framework, it remains an open question whether the Laplacian-based representations can be learned online and with theoretical guarantees along with policy learning. To answer this question, we study online Laplacian-based representation learning, where the graph-based representation is updated simultaneously while the policy is updated by the reinforcement learning algorithm. We design an online optimization formulation by introducing the Asymmetric Graph Drawing Objective (AGDO) and provide a theoretical analysis of the convergence of running online projected gradient descent on AGDO under mild assumptions. Specifically, we show that if the policy learning algorithm induces a bounded drift on the policy, running online projected gradient descent on AGDO exhibits ergodic convergence. Our extensive simulation studies empirically validate the guarantees of convergence to the true Laplacian representation. Furthermore, we provide insights into the compatibility of different reinforcement learning algorithms with online representation learning.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12496

Loading