ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification

TMLR Paper6600 Authors

21 Nov 2025 (modified: 02 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Remote sensing satellites such as Sentinel-2 provide high-resolution, multi-spectral imagery that enables dense, large-scale land cover classification. However, most deep learning models used in this domain—typically CNN-based architectures—are limited in their ability to process high-dimensional spectral data and scale with increasing dataset sizes. Moreover, while transformer architectures have recently been introduced for remote sensing tasks, their performance on large, densely labeled multi-spectral datasets remains underexplored. In this paper, we present ChromaFormer, a scalable family of multi-spectral transformer models designed for large-scale land cover classification. We introduce a novel Spectral Dependency Module (SDM) that explicitly learns inter-band relationships through attention across spectral channels, enabling efficient spectral-spatial feature fusion. Our models are evaluated on the Biological Valuation Map (BVM) of Flanders, a large, densely labeled dataset spanning over 13,500 km² and 14 classes. ChromaFormer models achieve substantial accuracy gains over baselines: while a 23M-parameter UNet++ achieves less than 70% accuracy, a 655M-parameter ChromaFormer attains over 96% accuracy. We also analyze performance scaling trends and demonstrate generalization to standard benchmarks. Our results highlight the effectiveness of combining scalable transformer architectures with explicit spectral modeling for next-generation remote sensing tasks.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Arto_Klami1
Submission Number: 6600
Loading