IN the known, OUT of the ordinary: Probing OOD detection methods with Synthetic datasets.

Amruth Sagar; Ravi Kiran Sarvadevabhatla

IN the known, OUT of the ordinary: Probing OOD detection methods with Synthetic datasets.

Amruth Sagar, Ravi Kiran Sarvadevabhatla

27 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: OOD detection, Benchmarking

TL;DR: Investigating out-of-distribution (OOD) detection methods in controlled settings by isolating and altering attributes, such as colour and class, to simulate various distribution shifts using two proposed synthetic datasets, SHAPES and CHARS.

Abstract: Out-of-distribution (OOD) detection is crucial for ensuring the reliability of machine learning models, especially in visual tasks. Most existing benchmarks focus on isolating distribution shifts and creating varying levels of detection difficulty, often relying on manual curation or classifier-based scoring with human annotations. Additionally, large-scale benchmarks are typically derivatives of ImageNet-21k classes or combinations of ImageNet with other datasets. However, no existing work offers a setup where only one attribute such as color or class changes in a controlled manner, while other attributes of the object remain constant. This limits our ability to precisely study the impact of individual attributes on OOD detection performance. We aim to address this by proposing two novel synthetic datasets, SHAPES and CHARS, designed to explore OOD detection under controlled and fine-grained distribution shifts. SHAPES consist of 2D and 3D geometric shapes with variations in color, size, position, and rotation, while CHARS consists of alphanumeric characters with similar variations. Each dataset presents three scenarios: (1) known classes with unseen attributes, (2) unseen classes with known attributes, and (3) entirely novel classes and attributes. We train 10 architectures and assess 13 OOD detection methods across the three scenarios, concentrating on the impact of attribute shifts on OOD scores, while also conducting additional analysis on how image corruption influences OOD scores. By systematically examining how specific attribute shifts affect OOD scores and the affects of noisy test samples, we aim to bring greater transparency to where these methods succeed or fail, helping to identify their limitations under various conditions.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10285

Loading