Guide Your Anomaly with Language

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: anomaly detection, vision-language model, language guidance, out-of-distribution detection
Abstract: Anomaly detection is the task of identifying data that is different from what is considered normal. Recent advances in deep learning have improved the performance of anomaly detection and are used in many applications. However, it can be difficult to create a model that reflects the desired normality due to various issues, including lack of data and nuisance factors. To address this, there have been studies that provide the desired knowledge to the model in various ways, but there are limitations, such as the need to understand deep learning. In this work, we propose a method to guide the desired normality boundary in an image anomaly detection task using natural language. By leveraging the robust generalization capabilities of the vision-language model, we present Language-Assisted Feature Transformation. LAFT transforms image features to suit the task through natural language using the shared image-text embedding space of CLIP. We extensively analyze the effectiveness of the concept on a toy dataset and show that it works effectively on real-world datasets.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4911
Loading