Evaluating Self-Supervised Objectives for Spectroscopic Satellite Embeddings

Manuel Ignacio Pérez-Carrasco; Core Francisco Park; Qindan Zhu; Rocco Di Tella; Zolal Ayazpour; Gonzalo González Abad; Cecilia Garraffo

Evaluating Self-Supervised Objectives for Spectroscopic Satellite Embeddings

Manuel Ignacio Pérez-Carrasco, Core Francisco Park, Qindan Zhu, Rocco Di Tella, Zolal Ayazpour, Gonzalo González Abad, Cecilia Garraffo

Published: 01 Mar 2026, Last Modified: 05 Apr 2026ML4RS @ ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Masked autoencoders produce worst reconstructions but learn the best atmospheric representations. Reconstruction fidelity is a poor proxy for representation utility in spectroscopic retrieval tasks

Abstract: Deep learning enables efficient compression of hyperspectral satellite observations, but the choice of self-supervised objective significantly impacts what information is preserved. We compare three paradigms on NASA's Tropospheric Emissions: Monitoring of Pollution (TEMPO) data: variational autoencoders (VAE), autoregressive generation (GIVT), and masked autoencoders (MAE). Our experiments reveal a trade-off between reconstruction fidelity and atmospheric information preservation. VAE and GIVT achieve near-perfect reconstruction (MSE $\sim$ 0.0002) but encode atmospheric products less effectively. MAE produces substantially worse reconstructions (MSE $\sim$ 0.02–0.33) yet consistently outperforms when retrieving NO$_2$, O$_3$, HCHO, and cloud fraction, with improvements up to 69\% for NO$_2$ retrieval over VAE at 64$\times$ compression (R$^2$ = 0.49 vs. 0.29). Aggregated across products, MAE improves over VAE by 17\% (MLP probes) to 32\% (linear probes). This trade-off diminishes at aggressive compression but grows as representational capacity increases. Our experiments provide empirical evidence that reconstruction quality is a poor proxy for representation utility in spectroscopic retrieval tasks, with direct implications for pretraining objective selection in climate foundation model design.

Submission Number: 44

Loading