Keywords: manifold learning, dimensionality reduction, large-scale data visualization, embeddings, representation learning
TL;DR: We propose DAE, a method that aligns random walks to create low-dimensional embeddings. DAE preserves both local and global structure and recovers meaningful patterns in benchmarks and single-cell RNA-seq.
Abstract: This paper introduced DAE, which formulates dimensionality reduction as aligning diffusion processes between high- and low-dimensional spaces. By minimizing the Path-KL divergence—which uniquely captures both transition probabilities and waiting times of continuous-time random walks—we proved formal bounds on generator and semigroup closeness, guaranteeing structure preservation across scales.
Our optimization algorithm decomposes this objective into attraction-repulsion terms with an unbiased gradient estimator, enabling efficient parallel implementation. Experiments on single-cell RNA-seq datasets showed DAE consistently preserves both local neighborhoods and global structure, while our CUDA implementation scales to millions of cells with competitive runtime.
The Path-KL framework provides theoretical guarantees that complement existing diffusion-based methods. DAE will be made available with CPU and GPU implementations.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 23692
Loading