Reconstruction Is Not Enough: Evaluating Self-Supervised Objectives for Spectroscopic Satellite Embeddings
TL;DR: Masked autoencoders produce worst reconstructions but learn the best atmospheric representations. Reconstruction fidelity is a misleading proxy for scientific utility.
Abstract: Deep learning enables efficient compression of hyperspectral satellite observations, but the choice of self-supervised objective significantly impacts what information is preserved. We compare three paradigms on NASA's Tropospheric Emissions: Monitoring of Pollution (TEMPO) data: variational autoencoders (VAE), autoregressive generation (GIVT), and masked autoencoders (MAE). Our experiments reveal a fundamental trade-off between reconstruction fidelity and atmospheric information preservation. VAE and GIVT achieve near-perfect reconstruction (MSE $\sim$ 0.0002) but encode atmospheric products less effectively. MAE produces substantially worse reconstructions (MSE $\sim$ 0.02–0.33) yet dramatically outperforms when retrieving NO$_2$, O$_3$, HCHO, and cloud fraction, with improvements up to 69\% for NO$_2$ retrieval over VAE at 64× compression (R$^2$ = 0.49 vs. 0.29). Aggregated across products, MAE improves over VAE by 17\% (MLP probes) to 32\% (linear probes).
This trade-off diminishes at aggressive compression but grows as representational capacity increases. Our findings challenge the assumption that reconstruction quality indicates representation utility for scientific applications, with direct implications for climate foundation model design.
Submission Number: 44
Loading