Keywords: Auxiliary Supervision, Architectural Inductive Bias, Gradient Analysis, Multi-objective Learning, Loss Landscape, Training Dynamics
Abstract: Is deep learning generalization necessarily rooted in optimizing a single objective? We explore an alternative view: adaptive generalization may emerge from structured interactions among heterogeneous objectives. We propose an Asymmetric Training Paradigm that temporarily introduces non-competitive, per-class supervision (Sigmoid losses) into networks optimized with competitive softmax objectives. This is realized through orthogonally initialized auxiliary pathways, modulated by a scalar coefficient $\alpha$ and present only during training. Crucially, we employ strictly controlled experiments to rule out parameter count as a confounder, identifying that simple parameter expansion yields zero gain. Our mechanistic analysis reveals two effects: (1) The proposed topology (but not mere capacity) consistently smooths the initial optimization landscape. (2) Final performance exhibits an architecture-dependent pattern we term Architectural Resonance, where auxiliary signals benefit models only when aligned with inductive biases. A 6-block Vision Transformer (ViT-6L) exhibits constructive gradient alignment (cosine similarity $+0.19$), yielding absolute accuracy gains of $+9.2\%$ on CIFAR-100. By contrast, a CNN shows destructive conflicts (cosine similarity $-0.26$). We further corroborate this divergence in hybrid architectures (CoAtNet), highlighting a stage-dependent nature: transformer stages benefit from heterogeneity, while convolutional stages show limited compatibility. We validate scalability on ImageNet-1k, showing consistent top-1 gains for ViTs (up to $+2.25\%$ on ViT-B/16). Rather than functioning as a universal regularizer, our probe reveals that heterogeneous signals selectively benefit architectures with weak inductive biases (e.g., Vision Transformers), exposing a critical dependence between architectural flexibility and
objective compatibility.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21729
Loading