Generalized Temporal Difference Learning Models for Supervised Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, temporal difference learning, supervised learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A generalized temporal different learning algorithm for supervised learning problems.
Abstract: In conventional statistical learning settings, data points are typically assumed to be independently and identically distributed (i.i.d.) according to some unknown probability distribution. Various supervised learning algorithms, such as generalized linear models, are derived by making different assumptions about the conditional distribution of the response variable given the independent variables. In this paper, we propose an alternative formulation in which data points in a typical supervised learning dataset are treated as interconnected, and we model the data sampling process by a Markov reward process. Accordingly, we view the original supervised learning problem as a classic on-policy policy evaluation problem in reinforcement learning, and introduce a generalized temporal difference (TD) learning algorithm to address it. Theoretically, we establish the convergence of our generalized TD algorithms under linear function approximation. We then explore the relationship between TD's solution and the original linear regression solution. This connection suggests that the probability transition matrix does not significantly impact optimal solutions in practice and hence can be easy to design. In our empirical evaluations, we examine critical designs of our generalized TD algorithm, and demonstrate the competitive generalization performance across a variety of benchmark datasets, including regression, binary classification, and image classification within a deep learning context.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7205
Loading