Integrating Visual Attention into Deep Reinforcement Learning for Enhanced Control in Racing Games

Khanh-Linh Vuong, Huy Truong Dinh, Cuong Tuan Nguyen

Published: 01 Jan 2026, Last Modified: 06 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: This study investigates the application of visual attention mechanisms within Deep Q-Networks framework to improve control in racing simulations with the Enduro game environment. Unlike prior studies that applied LSTM-based or transformer-based attention, this work integrates self-attention directly into a CNN-based DQN architecture, aiming to retain real-time inference capabilities suitable for racing games. The model was trained and later tested in both the same environment and a different version of the environment to assess its ability to generalize across environments. The hyperparameter-tuned model demonstrated superior results in ALE/Enduro-v5, highlighting the role of optimized attention mechanisms in reinforcement learning. These findings highlight that attention mechanisms could significantly improve decision-making and adaptability in high-speed control tasks, with implications for broader applications such as autonomous driving.

External IDs:doi:10.1007/978-3-031-98170-8_20