Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

TMLR Paper6292 Authors

23 Oct 2025 (modified: 09 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interpret when evaluated only through performance metrics. In particular, it is poorly understood which input features agents rely on, how these dependencies evolve during training, and how they relate to behavior. We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency. This approach aggregates saliency information at the object and modality level into hierarchical attention profiles, quantifying how agents allocate attention over time, thereby forming attention trajectories throughout training. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biomechanical user simulations in visuomotor interactive tasks, this methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels. These patterns correspond to measurable behavioral differences, demonstrating empirical links between attention profiles, learning dynamics, and agent behavior. To assess robustness of the attention profiles, we validate our findings across multiple saliency methods and environments. The results establish attention trajectories as a promising diagnostic axis for tracing how feature reliance develops during training and for identifying biases and vulnerabilities invisible to performance metrics alone.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the reviewers for their constructive feedback. In response, we have revised the manuscript to clarify our methodological framework and strengthen our empirical validation. Below is a summary of the key additions, mapped to the specific changes requested by the reviewers: - **Explicit Methodological Framework & Saliency Compatibility (Addresses Rev. ykXp):** We revised the main text (Section 3.1) to make the framework’s components more explicit: (i) quantification of saliency, (ii) longitudinal analysis, (iii) cross-condition/algorithm comparison, (iv) behavioral grounding, and (v) comparison across saliency methods. We updated Figure 1, the abstract, and the contribution statement to highlight this framework and the cross-method comparison as core contributions. - **Actor vs. Critic Clarification (Addresses Rev. ykXp):** We clarified the role of saliency methods across architectures, including the distinction between actor-based and value-based networks. Section 3.4 now explains explicitly how the saliency computation is defined for each algorithm and how this affects cross-algorithm comparisons. - **Interventional Evidence & Algorithm Differences (Addresses Rev. ykXp & JFBG):** We added new analyses (Appendix I.3) studying the effect of changing the replay buffer size of DQN and QR-DQN agents on both the attention profile and resistance to perturbations. This provides interventional evidence supporting the link between attention allocation and behavioral vulnerability, and offers a stronger interpretation of why DQN/QR-DQN develop different object-level attention patterns. - **Granularity and Labeling Noise (Addresses Rev. JFBG & 4Ze1):** We expanded the discussion of structured vs. unstructured inputs and added analyses showing how changes in object granularity and labeling noise impact the resulting h-profiles (Appendix M.1). These additions delineate the conditions under which the method is most reliable and acknowledge limitations when object labels are noisy. - **Sanity-Check Baselines (Addresses Rev. 4Ze1):** We added sanity-check analyses studying the effect of simple visual properties like object size and motion magnitude on the h-profile (Appendix M.2). This confirms that attention trajectories capture meaningful feature reliance rather than trivial visual properties. *Note: The new analyses are currently included in the appendix to maintain the flow of the main text, but we would be happy to move selected results to the main body if the Action Editor or reviewers consider them central. These additions do not alter the main conclusions but make the scope, assumptions, and methodology of the h-profile framework significantly more explicit.*
Assigned Action Editor: ~Dennis_Wei1
Submission Number: 6292
Loading