Abstract: Comparing whether two large language models (LMs) make similar predictions -- such as perplexity -- across massive input spaces is crucial for real-world applications. Traditional analyses average benchmark scores over fixed datasets, masking per-input differences. We propose Model-diff, a framework that estimates the distribution of prediction differences between two LMs across a large, meaningful input space -- defined as the set of token sequences assigned low negative log-likelihood (NLL). Model-diff leverages sampling-based histogram statistics to efficiently quantify output differences without exhaustive enumeration. Experiments reveal, for the first time, quantitative divergences between LMs in their low-NLL regions, providing a scalable tool for model comparison and diagnostic analysis.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Matt_Kusner1
Submission Number: 6327
Loading