LM4LV: A Frozen Large Language Model for Low-level Vision Tasks

Boyang Zheng; Jinjin Gu; Shijun Li; Chao Dong

LM4LV: A Frozen Large Language Model for Low-level Vision Tasks

Boyang Zheng, Jinjin Gu, Shijun Li, Chao Dong

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Low-level vision, Large Language Model, Self-supervised Learning

TL;DR: LLM can operate on low-level visual features without any multimodal data or alignment

Abstract: The success of large language models (LLMs) has fostered a new research trend of multi-modality large language models (MLLMs), which changes the paradigm of various fields in computer vision. Though MLLMs have shown promising results in numerous vision-language tasks such as VQA and text-to-image, no work has demonstrated how low-level vision tasks can benefit from MLLMs. We find that most current MLLMs are blind to low-level features due to their design of vision modules, and thus are inherently incapable of solving low-level vision tasks. In this work, we propose **LM4LV**, a framework that enables a FROZEN LLM to solve a range of low-level vision tasks without any multi-modal data or prior. This showcases the LLM's strong potential in low-level vision and bridges the gap between MLLMs and low-level vision tasks. We hope that this work can inspire new perspectives on LLMs and a deeper understanding of their mechanisms.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8519

Loading