Abstract: This paper investigates whether modeling image and text data as probability measures and applying optimal transport (OT)-based dimensionality reduction techniques leads to improved performance in downstream machine learning tasks. We compare OT-based neighbor embedding methods to their Euclidean counterparts across both classification and clustering tasks using benchmark datasets: MNIST, Fashion MNIST, Coil-20, Yale Face, and 20-Newsgroups. Our methodology involves computing distance matrices using Wasserstein or Euclidean metrics, applying dimensionality reduction techniques such as MDS, Isomap, t-SNE, and Laplacian eigenmaps, and evaluating performance using standard classifiers and clustering algorithms. Experimental results show that OT-based embeddings often yield better performance, although there is some variance in datasets with textures like Fashion MNIST. For all experiments, we perform a statistical hypothesis test to support the findings.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Makoto_Yamada3
Submission Number: 4719
Loading