Abstract: In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting ex-ceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distin-guished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche ap-plications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This pa-per introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We har-ness the potential of natural adversarial examples, inspired by their successful implementation in natural language pro-cessing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversar-ial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial ex-amples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant im-provements across a diverse range of segmentation tasks without necessitating additional data or architectural mod-ifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.
External IDs:dblp:conf/cvpr/0115XT24
Loading