TL;DR: This paper proposes a novel framework called Boundary-based Out-Of-Distribution data generation (BOOD), which synthesizes high-quality OOD features and generates human-compatible outlier images using diffusion models.
Abstract: Harnessing the power of diffusion models to synthesize auxiliary training data based on latent space features has proven effective in enhancing out-of-distribution (OOD) detection performance. However, extracting effective features outside the in-distribution (ID) boundary in latent space remains challenging due to the difficulty of identifying decision boundaries between classes. This paper proposes a novel framework called Boundary-based Out-Of-Distribution data generation (BOOD), which synthesizes high-quality OOD features and generates human-compatible outlier images using diffusion models. BOOD first learns a text-conditioned latent feature space from the ID dataset, selects ID features closest to the decision boundary, and perturbs them to cross the decision boundary to form OOD features. These synthetic OOD features are then decoded into images in pixel space by a diffusion model. Compared to previous works, BOOD provides a more training efficient strategy for synthesizing informative OOD features, facilitating clearer distinctions between ID and OOD data. Extensive experimental results on common benchmarks demonstrate that BOOD surpasses the state-of-the-art method significantly, achieving a 29.64\% decrease in average FPR95 (40.31\% vs. 10.67\%) and a 7.27\% improvement in average AUROC (90.15\% vs. 97.42\%) on the Cifar-100 dataset.
Lay Summary: In real-world scenarios, machine learning models often encounter unfamiliar inputs — things they weren’t trained to recognize — which can lead to incorrect or untrustworthy predictions. Identifying these unfamiliar inputs, known as Out-Of-Distribution (OOD) data, is essential for building reliable AI systems.
Our work introduces a new method called BOOD that uses advanced image generation tools, known as diffusion models, to create realistic examples of unfamiliar data. These examples are carefully designed to distributed just outside the boundary of original input training data, helping the system learn to spot the difference between known and unknown inputs more effectively.
By generating helpful OOD examples and training with them, our approach significantly improves the model’s ability to detect unusual or harmful inputs, while still maintaining strong performance on familiar tasks. Tests on widely used datasets show that BOOD outperforms existing methods by a large margin.
Primary Area: General Machine Learning->Everything Else
Keywords: OOD detection, Diffusion models, Training data generation
Submission Number: 6633
Loading