Keywords: scaling laws, general intelligence, emergence, circuits
TL;DR: In a toy model of an AI system, general-purpose capabilities emerge abruptly, then decline in relative importance, when learning from diverse, power-law-distributed tasks.
Abstract: Deep neural networks trained on vast datasets achieve strong performance on diverse tasks. These models exhibit empirical neural scaling laws, under which prediction error steadily improves with larger model scale. The cause of improvement is unclear, as strong general performance could result from acquiring general-purpose capabilities or specialized knowledge across many domains. To address this question theoretically, we study model scaling laws for a capacity-constrained predictor that optimally instantiates task-specific or general-purpose latent circuits. For a data distribution consisting of power-law-distributed tasks, each represented by a low-dimensional data manifold, general capabilities emerge abruptly at a threshold model scale and decline in relative importance thereafter. Data diversity and model expressivity increase general capabilities in distinct ways.
Serve As Reviewer: ~Aryeh_Brill1
Confirmation: I confirm that I and my co-authors have read the policies are releasing our work under a CC-BY 4.0 license.
Submission Number: 10
Loading