FD-Loss: Supervised Feature Decorrelation as a Scale-Invariant Replacement for Random Dropout

Published: 14 Jun 2026, Last Modified: 21 Jun 2026ICML 2026 Workshop MusIML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Network Regularization, Feature Decorrelation, Cross-Correlation Matrix, Dropout Techniques, Representation Learning
Abstract: Standard random dropout regularizes neural networks by stochastically deactivating units, yet remains fundamentally blind to representational redundancy: when two neurons converge on identical features, masking one does not generate a corrective gradient toward diversity. We propose Feature Decorrelation Loss (FD-Loss), a supervised regularization objective that explicitly penalizes the off-diagonal entries of the per-featurenormalized cross-correlation matrix of hidden activations. A mandatory per-feature ℓ2 normalization step resolves the gradient instability that caused prior covariance penalties (e.g., DeCov) to diverge on unscaled tabular data, bounding all correlation values to [−1, +1]. Extensive evaluation across 20 datasets spanning tabular, image, and text domains shows that FD-Loss achieves a 65% win rate over dropout, with accuracy improvements up to +5.35 pp on correlated tabular benchmarks and +4.12 pp on complex visual hierarchies, while incurring negligible computational overhead.
Track: Track 2: ML Research by Muslim Authors
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Non Archival Confirmation: I understand that submissions to MusIML are non-archival and can be submitted to other venues.
Submission Number: 14
Loading