Keywords: Coreset selection, Representation geometry, Foundation models, Post-training quantization, Remote sensing classification
Abstract: We reveal a fundamental yet overlooked coupling in foundation model deployment: data selection and quantization cannot be optimized independently. Through comprehensive experiments on remote sensing classification under extreme constraints (5\% labeled data, INT8/binary quantization), we demonstrate that standard coreset selection strategies, while effective at full precision, suffer catastrophic accuracy collapse once models are quantized, with binary networks degrading to near-chance performance. This failure occurs because conventional methods prioritize decision uncertainty while ignoring representation geometry, which quantization fundamentally distorts. We introduce Entropy-Based Density-Weighted Coresets (EntropyBDWC), a geometry-aware selection strategy that explicitly preserves local embedding structure under discretization. Evaluated across three datasets, four architectures, and multiple precision regimes, EntropyBDWC consistently outperforms entropy-based and random sampling under INT8 quantization and substantially stabilizes binary networks. Critically, we show that performing selection in frozen foundation model embeddings (DINO) amplifies this robustness, establishing a new role for foundation models as data coresets rather than trainable backbones. Our work establishes that quantization-aware data curation is not optional but essential, with implications extending beyond remote sensing to any resource-constrained deployment of foundation models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 159
Loading