Keywords: feedforward fully-connected layers, MLPs, random features, general dense layers, linear time complexity layers
TL;DR: This paper introduces EUGens: Efficient, Unified and General Dense Layers, capable of accurately approximating general feedforward fully connected layers, but of linear rather than quadratic time complexity.
Abstract: Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introduce computation and parameter count bottlenecks within neural network architectures.  To address this challenge, in this work, we propose a new class of dense layers that generalize standard fully-connected feedforward layers, $\textbf{E}$fficient, $\textbf{U}$nified and $\textbf{Gen}$eral dense layers (EUGens).  EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations. The proposed layers unify existing efficient FFL extensions and improve efficiency by reducing inference complexity from quadratic to linear time. They also lead to $\textbf{the first}$ unbiased algorithms approximating FFLs with arbitrary polynomial activation functions. Furthermore, EuGens reduce the parameter count and computational overhead while preserving the expressive power and adaptability of FFLs. We also present a layer-wise knowledge transfer technique that bypasses backpropagation, enabling efficient adaptation of EUGens to pre-trained models. Empirically, we observe that integrating EUGens into Transformers and MLPs yields substantial improvements in inference speed (up to $\textbf{27}$\%) and memory efficiency (up to $\textbf{30}$\%) across a range of tasks, including image classification, language model pre-training, and 3D scene reconstruction. Overall, our results highlight the potential of EUGens for the scalable deployment of large-scale neural networks in real-world scenarios.
Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
Submission Number: 18645
Loading