Keywords: Fine-Grained Visual Recognition, Loss Function, Margin Dynamics
TL;DR: We show that cross-entropy induces a confidence-dependent expanding margin that causes premature gradient saturation in fine-grained recognition, and propose a generalized dual-scale loss to control margin dynamics in a topology-aware manner.
Abstract: We identify that the standard Cross-Entropy loss exhibits a monotonically expanding intrinsic margin, causing gradient saturation in fine-grained tasks. To address this, we propose the Generalized Dual-Scale Loss, a unified framework controlling margin dynamics via a parameter $\lambda$. Experiments with Vision Transformers reveal that optimal dynamics are topology-dependent: rigid, geometric manifolds require aggressive hard mining ($\lambda > 1$) to resolve structural subtleties, whereas noisy, biological manifolds favor robust constant margins ($\lambda \approx 1$) to prevent overfitting to clutter. Our work advocates for aligning optimization dynamics with the intrinsic noise and granularity of the data.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Submission Number: 93
Loading