Agentic Subjective Q-Learning Equilibrium

Agentic Subjective Q-Learning Equilibrium

05 May 2026 (modified: 11 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In many applications, agents/decision makers take part in systems with very complex dynamics and they respond by inevitably making incorrect modeling assumptions. In this context, we define the concept of Agentic Subjective Q-Learning Equilibrium as an equilibrium concept where each agent uses local/partial information in their learning algorithm, as if the partial information constitutes an approximate Markov model. A distinguishing feature of such a setup is that the exploration policy used for learning impacts the perceived model and there is a dual dependence of the induced cost on the agent policies: By noting an equivalence with empirical model learning, it follows that an exploration policy generates the sample path which induces a model (which depends on the exploration policy), and the model is used to obtain an optimal policy (for the learned model) either via reinforcement learning or empirical learning. This then leads to the question on existence of a fixed point equation involving learning and exploration. An agentic subjective learning equilibrium policy is thus defined as a policy which is self-confirming: the model induced by the policy has the policy as an optimal solution. We establish an existence result on equilibria critically building on continuous dependence of invariant measures on policies under a suitable control topology. We then present an associated learning/convergence theorem to $\epsilon$-equilibria via policy revision dynamics. We show implications for symmetric dynamic games (including mean-field games), weakly acyclic games (including potential games), and generalized weakly acyclic games.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=hwVIZJpEKb

Changes Since Last Submission: The correct TMLR style file is now used (compared with the previous version).

Assigned Action Editor: ~Bo_Dai1

Submission Number: 8780

Loading