Binary Hypothesis Testing for Softmax Models and Leverage Score Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We analyze the binary hypothesis testing problem for softmax and leverage score models, which are relevant to large language models and attention mechanisms.
Abstract: Softmax distributions are widely used in machine learning, including Large Language Models (LLMs), where the attention unit uses softmax distributions. We abstract the attention unit as the softmax model, where given a vector input, the model produces an output drawn from the softmax distribution (which depends on the vector input). We consider the fundamental problem of binary hypothesis testing in the setting of softmax models. That is, given an unknown softmax model, which is known to be one of the two given softmax models, how many queries are needed to determine which one is the truth? We show that the sample complexity is asymptotically $O(\epsilon^{-2})$ where $\epsilon$ is a certain distance between the parameters of the models. Furthermore, we draw an analogy between the softmax model and the leverage score model, an important tool for algorithm design in linear algebra and graph theory. The leverage score model, on a high level, is a model which, given a vector input, produces an output drawn from a distribution dependent on the input. We obtain similar results for the binary hypothesis testing problem for leverage score models.
Lay Summary: Imagine you’re trying to figure out if a coin is fair. You flip it many times and count how often it lands heads. If it’s close to 50/50, you might say, "Seems fair." But if it lands heads 90% of the time, something feels off. This is the core idea behind hypothesis testing, a method for making decisions under uncertainty. We begin with a default assumption (called the null hypothesis), like "The coin is fair," and then collect data to see whether that assumption holds. If the evidence strongly contradicts it, we reject the null and accept an alternative hypothesis, like "The coin is biased." Our theoretical work explores hypothesis testing in the context of two fundamental mathematical tools: the softmax distribution and the leverage score distribution. These tools are central to modern AI systems, scientific computing frameworks, and operations research methods, shaping technologies we rely on every day. Our results provide insights into decision-making under uncertainty, with potential applications such as determining whether two neural networks behave similarly, among many others.
Primary Area: Deep Learning->Large Language Models
Keywords: Binary hypothesis testing, softmax distributions, large language models, attention
Submission Number: 8881
Loading