Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He

Published: 2024, Last Modified: 19 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one im-age restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail rep-resentation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, en-abling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module im-proves restoration fidelity via direct encoder-to-decoder in-formation transformation. To assess our method, MPer-ceiver is trained on 9 tasks for all-in-one IR and outper-forms state-of-the-art task-specific methods across many tasks. Post multitask pre-training, MPerceiver attains a generalized representation in low-level vision, exhibiting remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, gener-alizability and fidelity.