Model-diff: A Tool for Comparative Study of Language Models in the Input Space

Model-diff: A Tool for Comparative Study of Language Models in the Input Space

TMLR Paper6327 Authors

28 Oct 2025 (modified: 07 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Comparing whether two large language models (LMs) make similar predictions -- such as perplexity -- across massive input spaces is crucial for real-world applications. Traditional analyses average benchmark scores over fixed datasets, masking per-input differences. We propose Model-diff, a framework that estimates the distribution of prediction differences between two LMs across a large, meaningful input space -- defined as the set of token sequences assigned low negative log-likelihood (NLL). Model-diff leverages sampling-based histogram statistics to efficiently quantify output differences without exhaustive enumeration. Experiments reveal, for the first time, quantitative divergences between LMs in their low-NLL regions, providing a scalable tool for model comparison and diagnostic analysis.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Matt_Kusner1

Submission Number: 6327

Loading