Activation-Guided Regularization: Improving Deep Classifiers using Feature-Space Regularization with Dynamic Prototypes

ICLR 2026 Conference Submission20490 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation Learning, Feature Space Regularization, Loss Functions, Model Robustness, Explainable AI
TL;DR: Proposed a feature guided regularization technique that can improve the performance of deep learning classifier and provide more robust model.
Abstract: The softmax cross-entropy loss, which is the de facto standard for training deep classifiers, does not explicitly guide the formation of a well-structured internal feature space. This can limit model generalization and robustness. In this paper, we explore how the deep learning model's internal neuron activation patterns can be leveraged to create a powerful regularization signal. We introduce Activation Guided Regularization (AGR), a novel training objective that directly addresses this. AGR enhances standard training by introducing a secondary objective that encourages a sample's feature embedding to be similar to a dynamically generated prototype. These prototypes, which represent the mean neuron activation pattern for each class using the model's own high-confidence predictions, are generated dynamically in an efficient, self-regularizing feedback loop that requires no changes to the model architecture. We conduct extensive experiments across diverse computer vision benchmarks, including standard object recognition, fine-grained classification, and medical imaging tasks. Our results demonstrate that AGR consistently and significantly improves classification accuracy over strongly-trained baselines across a variety of architectures, from simple CNNs to large transformer-based models. Furthermore, we provide extensive analysis showing that these performance gains are a direct result of a more structured learned feature space, characterized by quantitatively improved intra-class compactness, inter-class separability, and qualitatively clearer cluster separation in t-SNE and UMAP visualizations. Finally, we show that these superior representations, learned by attending to neuron activation patterns, lead to enhanced model robustness against data corruptions and improved feature transferability in few-shot learning scenarios.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 20490
Loading