Instruct2See: Learning to Remove Any Obstructions Across Distributions

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Images are often obstructed by various obstacles due to capture limitations, hindering the observation of objects of interest. Most existing methods address occlusions from specific elements like fences or raindrops, but are constrained by the wide range of real-world obstructions, making comprehensive data collection impractical. To overcome these challenges, we propose Instruct2See, a novel zero-shot framework capable of handling both seen and unseen obstacles. The core idea of our approach is to unify obstruction removal by treating it as a soft-hard mask restoration problem, where any obstruction can be represented using multi-modal prompts, such as visual semantics and textual instructions, processed through a cross-attention unit to enhance contextual understanding and improve mode control. Additionally, a tunable mask adapter allows for dynamic soft masking, enabling real-time adjustment of inaccurate masks. Extensive experiments on both in-distribution and out-of-distribution obstacles show that Instruct2See consistently achieves strong performance and generalization in obstruction removal, regardless of whether the obstacles were present during the training phase. Code and dataset are available at https://jhscut.github.io/Instruct2See.
Lay Summary: Images are often obstructed by various obstacles due to capture limitations, but existing methods can't handle all types of obstruction. We create ​Instruct2See, which adopts multi-modal prompts to automatically remove unwanted obstructions. Unlike traditional methods, it removes any obstruction without extra training required for specific types, making image restoration universal and effortless.
Primary Area: Applications->Computer Vision
Keywords: computer vision, image restoration
Submission Number: 2731
Loading