SafeFix: Targeted Model Repair via Controlled Image Generation

SafeFix: Targeted Model Repair via Controlled Image Generation

14 Feb 2026 (modified: 24 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Deep learning models for visual recognition often exhibit systematic errors due to underrepresented semantic subpopulations. While existing debugging frameworks can identify these failure slices, effectively repairing them remains difficult. Current solutions often rely on manually designed prompts to generate synthetic images—an approach that introduces distribution shift and semantic errors, often resulting in new bugs. To address these issues, we introduce SafeFix, a framework for distribution-consistent model repair via controlled generation that employs a diffusion model to generate semantically faithful images that modify only specific failure attributes while preserving the underlying data distribution. To ensure the reliability of the repair data, we implement a verification mechanism using a large vision--language model (LVLM) to enforce semantic consistency and label preservation. By retraining models on the synthetic data, we significantly reduce errors in rare cases and improve overall performance. Our experiments show that SafeFix achieves superior robustness by maintaining high precision in attribute editing without introducing additional bugs.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~David_Fouhey2

Submission Number: 7509

Loading