AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

David McCoy; Yulun Wu; Zachary Butzin-Dozier

AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

David McCoy, Yulun Wu, Zachary Butzin-Dozier

Published: 26 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 Position Paper TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sustainable AI, Resource-efficient training, Parameter-efficient fine-tuning, Gradient-guided optimization, Large language models

TL;DR: We challenge AI scaling fundamentalism by proving partial parameter updates outperform full tuning under heavy-tailed gradients. Our gradient blueprints democratize efficient model adaptation, prioritizing capability-per-resource over scale.

Abstract: This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and widening resource inequality. We argue that LLM development should be fundamentally reoriented toward capability-per-resource rather than capability alone. We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle. Our analysis shows that in transformer-based models, where a small fraction of parameters exert outsized influence (following heavy-tailed distributions), three critical insights emerge: (1) updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis; (2) simple gradient norms provide computationally efficient proxies for identifying these high-influence components; and (3) coordinated parameter and data selection yields multiplicative efficiency gains, potentially reducing resource requirements by orders of magnitude. Building on these theoretical foundations, we propose a two-stage paradigm—marginal-return pretraining for foundation developers and influence-guided adaptation for downstream users—bridged by gradient blueprints, metadata describing which parameters matter most for various tasks. This capability-per-resource perspective transforms what were once considered pragmatic hardware workarounds into theoretically optimal strategies, democratizing access to cutting-edge AI capabilities while significantly reducing environmental impact. By embedding resource consciousness into how we develop, adapt, and evaluate models, we can reshape AI progress toward a more sustainable and equitable future.

Lay Summary: Recent AI progress has largely come from making models ever bigger and training them longer. That approach delivers strong results, but it also burns huge amounts of compute and energy and puts cutting‑edge systems out of reach for most researchers. Our paper argues for a different north star: measure and optimize capability per resource—how much a model improves for each unit of compute, memory, or energy used—rather than raw capability alone. We show, with theory and simple rules of thumb, that much of a large model’s learning comes from a relatively small set of its internal parts. If you focus your updates on those high‑impact parts—and train mainly on the most informative examples—you can keep most of the gains while spending a fraction of the resources. Crucially, you don’t need heavy math to find those parts: straightforward signals available during training are good enough. Building on this, we propose a two‑stage practice. For large labs that pretrain foundation models, track “improvement per unit resource” and stop when extra compute yields too little benefit. For everyone else, adapt the released model by updating only the components that matter for your task. To connect these stages, we introduce gradient blueprints: small files released with a model that act like a map of which components have the biggest impact for different skills (for example, translation, math, or biomedical language). These maps help smaller teams fine‑tune efficiently without sacrificing quality. This capability‑per‑resource mindset turns today’s practical tricks into a principled, shareable strategy. It can cut energy use and costs, broaden access to strong AI, and still deliver high performance. Our paper provides both the theoretical basis and a concrete path to make these efficiency gains routine.

Submission Number: 491

Loading