Keywords: Weight space learning, large language models, mode lineage
TL;DR: Propose a spectral signature for tracing Lineage and clustering of LLMs
Abstract: The rapidly growing repository of publicly available large language models (LLMs) presents significant challenges for systematic management and quantification at scale, such as model lineage tracing, licensing, and evaluation. However, task-specific benchmarks are insufficient for this setting, as LLMs differ widely in architecture, scale, and training procedures. To address this challenge, we introduce a spectral shape-based metric for managing and quantifying LLMs based on Heavy-Tailed Self-Regularization theory. Our approach uses the shape information of the weight empirical spectral density as a compact spectral signature of each model. This signature captures intrinsic properties of pretrained models and remains robust during post-training, making it suitable for model-level analysis. In addition, the proposed metric is data-free, computationally efficient, and scale-invariant, enabling large-scale analysis in practice. We show that our spectral signature supports tracing model lineage and unsupervised clustering. Overall, the proposed spectral signature provides a meaningful proxy for broad performance trends across LLMs, enabling efficient organization, comparison, and analysis of large model collections.
Paper Type: Long (8 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 62
Loading