Keywords: Perceiver, diffusion model, autoencoder, self-supervised learning
TL;DR: We proposed a diffusion autoencoder using perceiver architecture to handle long, irregular and multimodal sequences.
Abstract: Self-supervised learning has become a central strategy for representation learning, but majority of the successful architectures assume regularly sampled inputs such as images, audios. and videos. In many scientific domains---e.g., astrophysics data arrive as long, irregular, and multimodal sequences where existing methods might not handle natively. We introduce the Diffusion Autoencoder with Perceivers (daep), a diffusion autoencoder architecture designed for such settings. Our method tokenizes heterogeneous measurements, compresses them with a Perceiver encoder, and reconstructs them with a Perceiver-IO diffusion decoder, enabling scalable learning without assuming uniform sampling. For fair comparison, we also adapt masked autoencoders (MAE) with Perceivers, establishing a strong baseline in the same architectural family. Across spectral, photometric, and multimodal astronomical datasets, daep achieves lower reconstruction error and produces smoother, more discriminative latent spaces than VAE and perceiver-MAE baselines, particularly when preserving high-frequency structure is critical. Our results suggest daep provides a general framework for learning robust representations from irregular multimodal data, with applications potentially well beyond astronomy.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 14631
Loading