Neural Mutual Information Estimation in Real Time via Pre-trained Hypernetworks

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: statistical dependence, transformers, hypernetwork, mutual information
TL;DR: A pre-trained attention-based model for statistical dependence quantification, accurate, fast and differentiable
Abstract: Measuring statistical dependency between high-dimensional random variables is fundamental to data science and machine learning. Neural mutual information (MI) estimators offer a promising avenue, but they typically require costly test- time iterative optimization for each new dataset, making them impractical for real-time applications. We present *FlashMI*, a pretrained, foundation model-like architecture that eliminates this bottleneck by directly inferring MI in a single forward pass. Pretrained on large-scale synthetic data covering diverse distributions and dependency structures, *FlashMI* learns to identify distributional patterns and predict MI directly from the input dataset. Comprehensive experiments demonstrate that *FlashMI* matches state-of-the-art neural estimators in accuracy while achieving 100× speedup, can seamlessly handle varying dimensions and sample sizes through a single unified model, and generalizes zero-shot to real-world tasks, including CLIP embedding analysis and motion trajectory modeling. By reformulating MI estimation from an optimization problem to a direct inference task, *FlashMI* establishes a practical foundation for real-time dependency analysis.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2321
Loading