Keywords: Machine learning, Mixture-of-Experts
TL;DR: We speed up FFF architecture 4x, but MOE is still somewhat better
Abstract: We dissect the recently introduced Fast Feed-Forward (FFF) neural network architecture and propose a matrix formulation of FFF that allows a unified perspective on FFF and Mixture of Experts (MoE) architectures. This formulation achieves, on average, a nearly 4x speedup in the FFF inference on GPUs compared to the original formulation for depths of up to 8.
Submission Number: 49
Loading