Keywords: Large Language Models, Model Scaling, Parameter Reuse
TL;DR: The 4th dimension for scaling model size,which significantly improves reasoning while forcing the knowledge capacity to be constant.
Abstract: Scaling the size of large language models typically involves 3 dimensions: depth, width, and the number of parameters. In this work, we explore a 4th dimension: virtual logical depth (VLD), which allows increasing the effective algorithmic depth without changing the overall parameter count by reusing parameters within the model. While parameter reuse is not new, its role in scaling dynamics has remained underexplored. Unlike currently trending test-time methods, which mainly scale in token-wise, VLD alters the internal computation graph scaling during training, inference, or combination. We carefully design controlled experiments and have the following key insights on VLD scaling: 1. Knowledge capacity vs. parameters. At a fixed parameter count, VLD leaves knowledge capacity nearly unchanged (with only minor variance), while across models knowledge capacity scales with the number of parameters; 2. Reasoning vs. reuse. Properly implemented VLD substantially improves reasoning ability without increasing parameter count, decoupling reasoning from sheer model size. This provides a new possibility for scaling besides the current token-wise test-time scaling used by most reasoning models. 3. Robustness and generality. The trend of improved reasoning persist across architectures and configurations (e.g., different reuse schedules and step counts), indicating that VLD captures a general scaling behavior. These findings not only provide useful insights into the future model scaling strategies, but also introduce an even deeper question: Does super intelligence necessarily require ever-larger models, or could it have some trade-offs by re-using parameters and increasing virtual logic depth? We believe that there are many unknown dynamics within the model scaling that need exploration. Codes are available at https://anonymous.4open.science/r/virtual_logical_depth-8024/.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22144
Loading