Dr. RAW: Towards General High-Level Vision from RAW with Efficient Task Conditioning

Wenjun Huang; Ziteng Cui; Yinqiang Zheng; Yirui He; Tatsuya Harada; Mohsen Imani

Dr. RAW: Towards General High-Level Vision from RAW with Efficient Task Conditioning

Wenjun Huang, Ziteng Cui, Yinqiang Zheng, Yirui He, Tatsuya Harada, Mohsen Imani

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Computational Photography, Image Signal Processor (ISP)

Abstract: We introduce Dr. RAW, a unified and tuning-efficient framework for high-level computer vision tasks directly operating on camera RAW data. Unlike previous approaches that optimize image signal processing (ISP) pipelines and fully fine-tune networks for each task, Dr. RAW achieves state-of-the-art performance with minimal parameter updates. At the input stage, we apply lightweight pre-processing modules, sensor and illumination mapping, followed by re-mosaicing, to mitigate data inconsistencies stemming from sensor variation and lighting. At the network level, we introduce task-specific adaptation through two modules: Sensor Prior Prompts (SPP) and Low-Rank Adaptation (LoRA). SPP injects sensor-aware conditioning into the network via learnable prompts derived from imaging priors, while LoRA enables efficient task-specific tuning by updating only low-rank matrices in key backbone layers. Despite minimal tuning, our method delivers superior results across four RAW-based tasks (object detection, semantic segmentation, instance segmentation, and pose estimation) on nine datasets encompassing low-light and over-exposed conditions. By harnessing the intrinsic physical cues of RAW data alongside parameter-efficient techniques, our method advances RAW-based vision systems, achieving both high accuracy and computational economy. We will release our source code.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 1235

Loading