Keywords: Self-Supervised Learning, JEPA, Domain Alignment, Mul-timodality, Embedded Perception
TL;DR: We propose a self-supervised JEPA-based approach that aligns RGB and IR modalities in a shared semantic space, eliminating the need for costly manual annotations in multispectral image fusion.
Abstract: RGB and IR image fusion requires precise alignment andannotated datasets. To eliminate this need for manual labe-ling, we propose a self-supervised approach using the Joint-Embedding Predictive Architecture (JEPA). By predictingIR latent features from masked RGB context, our model pro-jects both modalities into a shared semantic space. Prelimi-nary results show this alignment provides a solid founda-tion for embedded perception without any human interven-tion
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 20
Loading