everyone
since 22 Feb 2025">EveryoneRevisionsBibTeXCC BY 4.0
This paper discusses the optimization of the confrontation position of agent in a two-dimensional game system with incomplete information, based on the random movements of agent relative to obstacles. Incomplete information games contain numerous unknown factors, adding complexity to intelligent control game systems. To address these unknown factors, intelligent control game systems often require substantial data and significant computational resources. However, real confrontation systems involve intelligence gathering, and neither side will have a complete information set about the confrontation situation. To tackle this issue, this paper proposes a Maximum Expected Stochastic Gradient Variational Inference Algorithm within the Q-learning framework, which can infer the position coordinates of obstacles in the confrontation plane. In this paper, the experimental data of Q-learning model of reinforcement learning,The Maximum Expectation Stochastic Gradient Variational Inference algorithm is then used to estimate the position coordinates of the obstacles arranged by player B.