Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Published: 18 Jun 2025, Last Modified: 18 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of these fields in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. Then, we highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection and related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude with open challenges and future directions. The resource is available at https://github.com/AtsuMiyai/Awesome-OOD-VLM.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission:

We have incorporated the editor’s minor revision requests into the final version. Each response for the requests is as follows:

  • MR1: In Section 2.2, we have added a definition of “top venues” and clarified the methodology used to assess research activity, as well as the rationale behind the time period considered. In short, we define the research activity of a field objectively by the number of papers published through a rigorous peer-review process during the period starting with the introduction of VLMs (e.g., CLIP) and continuing to the present.

  • MR2: As the reviewer mentioned, the original phrasing, "Sensory AD has become a highly active and noteworthy field in the VLM era," may give the misleading impression that the field was not active until VLMs emerged. To adjust the statements of the reviewer, we revised the statement to: "Sensory AD has consistently been an active research field even after the emergence of VLMs."

Assigned Action Editor: Chuan-Sheng Foo
Submission Number: 4404
Loading