Matrix Mixture of Experts is the Best Fast Feed-Forward

Vyacheslav Chaunin; Nikolay Mikhaylovskiy

Matrix Mixture of Experts is the Best Fast Feed-Forward

Vyacheslav Chaunin, Nikolay Mikhaylovskiy

Published: 09 Mar 2025, Last Modified: 10 Mar 2025MathAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine learning, Mixture-of-Experts

TL;DR: We speed up FFF architecture 4x, but MOE is still somewhat better

Abstract: We dissect the recently introduced Fast Feed-Forward (FFF) neural network architecture and propose a matrix formulation of FFF that allows a unified perspective on FFF and Mixture of Experts (MoE) architectures. This formulation achieves, on average, a nearly 4x speedup in the FFF inference on GPUs compared to the original formulation for depths of up to 8.

Submission Number: 49

Loading