TL;DR: We derive a PAC-Bayesian generalization bound for LLM fine-tuning dynamics and propose LENSLLM, a framework that enables accurate, efficient performance prediction across diverse tasks.
Abstract: The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: *how can we model the dynamic behaviors of LLMs during fine-tuning, thereby enhancing our understanding of their generalization performance across diverse downstream tasks?* In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. In particular, we first derive a *PAC-Bayesian Generalization Bound* that unveils fine-tuning dynamics of LLMs and then introduce *LensLLM*, a Neural Tangent Kernel (NTK)-based Rectified Scaling Model that enables accurate performance predictions across diverse tasks while maintaining computational efficiency. Extensive empirical results on 3 large-scale benchmarks demonstrate that our model achieves up to 91.1% accuracy and reduces up to 88.5% computational cost in LLM selection, outperforming 5 state-of-the-art methods. We open-source our proposed *LensLLM* model and corresponding results at [LensLLM.io](https://github.com/Susan571/LENSLLM.git).
Lay Summary: Today, many powerful AI language models are freely available to the public. These models can answer questions, summarize texts, and translate languages—but not all models perform equally well on every task. Choosing the right model for the job is tricky, especially since testing all of them thoroughly can be extremely time-consuming and expensive.
Our work introduces a new tool, called LensLLM, that helps researchers and engineers quickly and accurately choose the best model for their needs—without having to test each one exhaustively. We found that language models go through two stages as they learn: a slow early stage and a faster, more predictable stage once enough training data is seen. Using this insight, we built a mathematical model that predicts how well a language model will do on a task with much less computation.
Our tool was tested on several large benchmarks and outperformed five leading methods. Not only does LensLLM make smarter choices, but it also cuts down the computational cost by up to 88%. This can make developing AI tools faster, cheaper, and more accessible to everyone.
Link To Code: https://github.com/Susan571/LENSLLM
Primary Area: Deep Learning->Large Language Models
Keywords: LLM Selection, PAC-Bayesian Theory, Generalization Bound, Scaling Laws
Submission Number: 13652
Loading