everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
In this paper, we explore semantic clustering properties of deep reinforcement learning (DRL) to improve its interpretability and deepen our understanding of the internal semantic organization. In this context, semantic clustering refers to the ability of neural networks to cluster inputs based on their semantic similarity in the internal space. We propose a DRL architecture that incorporates a novel semantic clustering module, which includes both feature dimensionality reduction and online clustering. This module integrates seamlessly into the DRL training pipeline, addressing the instability of t-SNE and eliminating the need for extensive manual annotation in the previous semantic analysis methods. Through experiments, we validate the effectiveness of the proposed module and demonstrate its ability to reveal semantic clustering properties within DRL. Furthermore, we introduce new analytical methods that leverage these properties to provide insights into the hierarchical structure of policies and the semantic organization within the feature space. These methods also help identify potential risks within the model, offering a deeper understanding of its limitations and guiding future improvements.