Keywords: Auxiliary Supervision, Architectural Inductive Bias, Gradient Analysis, Multi-objective Learning, Loss Landscape, Training Dynamics
Abstract: Is deep learning robustness necessarily rooted in optimizing a single objective? We explore an alternative view: adaptive generalization may emerge from structured interactions among heterogeneous objectives during training. We propose an Asymmetric Training Paradigm that temporarily introduces non-competitive, per-class supervision (sigmoid losses) into networks optimized with competitive softmax objectives. This is realized through orthogonally initialized auxiliary pathways, modulated by a scalar coefficient $\alpha$ and present only during training. This controlled form of temporary topological redundancy creates an ideal probe for studying objective interactions. Our mechanistic analysis shows that such redundancy consistently smooths the initial loss landscape, but its performance impact follows a Principle of Architectural Resonance: auxiliary signals benefit models only when aligned with architectural inductive biases. A 6-block Vision Transformer (ViT-6L) exhibits constructive gradient alignment (cosine similarity +0.19), yielding up to 25\% accuracy gains on CIFAR-100 with $20\times$ redundancy; by contrast, a CNN shows destructive conflicts (cosine similarity -0.26), leading to degradation. These findings challenge the view of auxiliary supervision as a universal regularizer. Instead, they reveal robustness as an outcome of structured internal dialogues between objectives, opening a path toward the design of multi-objective training systems tuned to architectural biases.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21729
Loading