Keywords: white-box deep neural networks, projection, compression, expansion
TL;DR: A principled attention with better interpretability that achieving feature compression or expansion based on the geometric insight of gradient of MCR^2
Abstract: The maximal coding rate reduction ($\text{MCR}^2$) objective has been proposed to learn low-dimensional subspace representations by minimizing the compression term for intra-group compactness and maximizing the expansion term for inter-group separation.Several studies have leveraged $\text{MCR}^2$ to design principled, interpretable deep models
by following or approximating its gradient to derive layer structures.However, these approaches remain limited in achieving fully principled and effective compression and lack self-adaptive control over the strength of expansion and compression across layers.In this work, we introduce \textbf{ECAttention}, a novel attention mechanism that incorporates principled expansion and compression modules inspired by the \textbf{geometric insight} of $\text{MCR}^2$.
Geometrically, gradient-based updates of $\text{MCR}^2$ move features along directions shaped by the underlying data structure.
Our method efficiently captures this structure using randomization combined with Cholesky decomposition to guide feature updates with \textbf{nearly linear complexity}.By introducing two trainable weights per layer, ECAttention self-adaptively regulates the strengths of compression and expansion.The resulting ECA transformer not only matches or surpasses prior methods, but also exhibits greater interpretability, with different heads focusing on distinct image regions and capturing \textbf{fine-grained structures} under simple supervised training.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9965
Loading