Matrix Mixture of Experts is the Best Fast Feed-Forward

Published: 09 Mar 2025, Last Modified: 10 Mar 2025MathAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine learning, Mixture-of-Experts
TL;DR: We speed up FFF architecture 4x, but MOE is still somewhat better
Abstract: We dissect the recently introduced Fast Feed-Forward (FFF) neural network architecture and propose a matrix formulation of FFF that allows a unified perspective on FFF and Mixture of Experts (MoE) architectures. This formulation achieves, on average, a nearly 4x speedup in the FFF inference on GPUs compared to the original formulation for depths of up to 8.
Submission Number: 49
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview