Keywords: statistical dependence, transformers, hypernetwork, mutual information
TL;DR: A pre-trained attention-based model for statistical dependence quantification, accurate, fast and differentiable
Abstract: Measuring statistical dependency between high-dimensional random variables is
fundamental to data science and machine learning. Neural mutual information
(MI) estimators offer a promising avenue, but they typically require costly test-
time iterative optimization for each new dataset, making them impractical for
real-time applications. We present *FlashMI*, a pretrained, foundation model-like
architecture that eliminates this bottleneck by directly inferring MI in a single
forward pass. Pretrained on large-scale synthetic data covering diverse distributions
and dependency structures, *FlashMI* learns to identify distributional patterns and
predict MI directly from the input dataset. Comprehensive experiments demonstrate
that *FlashMI* matches state-of-the-art neural estimators in accuracy while achieving
100× speedup, can seamlessly handle varying dimensions and sample sizes through
a single unified model, and generalizes zero-shot to real-world tasks, including
CLIP embedding analysis and motion trajectory modeling. By reformulating
MI estimation from an optimization problem to a direct inference task, *FlashMI*
establishes a practical foundation for real-time dependency analysis.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2321
Loading