Unified Scaling Laws for Routed Language ModelsDownload PDFOpen Website

2022 (modified: 24 Apr 2023)ICML 2022Readers: Everyone
Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that condi...
0 Replies

Loading