JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon; Loick Chambon; Louis Serrano; Jean-Emmanuel HAUGEARD; Matthieu Cord; Nicolas THOME

JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon, Loick Chambon, Louis Serrano, Jean-Emmanuel HAUGEARD, Matthieu Cord, Nicolas THOME

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Feature Upsampling, Dense Vision Tasks

TL;DR: We propose a novel architecture and training objective specifically designed to upsample features from foundation vision encoders at any resolution.

Abstract: Foundation Vision Encoders have become indispensable across a wide range of dense vision tasks. However, their operation at low spatial feature resolutions necessitates subsequent feature decompression to enable full-resolution processing. To address this limitation, we introduce JAFAR, a lightweight and flexible feature upsampler designed to enhance the spatial resolution of visual features from any Foundation Vision Encoder to any target resolution. JAFAR features an attention-based upsampling module that aligns the spatial representations of high-resolution queries with semantically enriched low-resolution keys via Spatial Feature Transform modulation. Despite the absence of high-resolution feature ground truth; we find that learning at low upsampling ratios and resolutions generalizes surprisingly well to much higher scales. Extensive experiments demonstrate that JAFAR recovers intricate pixel-level details and consistently outperforms existing feature upsampling techniques across a diverse set of dense downstream applications.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 7317

Loading