Non-Asymptotic Convergence Bounds for Cross-Entropy Estimation between Neural Auto-Regressive Language Models: Theoretical Analysis
Abstract: Cross-entropy (CE) represents a central metric in evaluating the performance and other characteristics of Neural Auto-Regressive Language Models (NARLMs). Despite its importance, the convergence properties of its estimation remain relatively underexplored from a theoretical perspective, primarily due to the complex structure of modern language model architectures. This article aims at investigating this issue by providing a formal theoretical analysis of the covergence properties of the CE estimation between different families of NARLMs. When the test distribution is modeled by a LSTM/GRU, we will show that CE estimation exhibits a non-vacuous convergence rate, which depends linearly on the norm of the output matrix of the test model and logarithmically on the alphabet size. Additionaly, we provide a variance-based convergence bound applicable to large families of NARLM, including Decoder-only Transformer-based models and LSTMs/GRUs.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: evaluation; statistical testing for evaluation;
Contribution Types: Theory
Languages Studied: English;
Submission Number: 4592
Loading