Learning Gaussian Mixture Models via Transformer Measure Flows

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformers, GMM, Measure-to-measure flow map
TL;DR: We propose a transformer model for clustering Gaussian Mixture data by interpreting transformers as measure-to-measure flows. Instead of estimating cluster parameters directly, we minimize Wasserstein distance to recover cluster distributions.
Abstract: We introduce a transformer architecture for approximating Gaussian Mixture Models (GMMs) through a measure-to-measure flow interpretation. Rather than estimating explicit cluster parameters, our model predicts the underlying cluster probability distribution by minimizing Wasserstein distance to the true measure. A key innovation is the flow speed hyperparameter, which adjusts clustering intensity by varying transformer step size and indirectly controlling model depth based on the desired output complexity. Experimental results show performance comparable to or exceeding classical algorithms like K-means, while the synthetic setup provides a lightweight, interpretable sandbox for investigating transformer flow foundations without computational overhead of language-based benchmarks.
Code: zip
Submission Number: 89
Loading