Keywords: CAPTCHAs, Adversarial examples, Vision models, Robust models
Abstract: Modern CAPTCHAs often rely on vision tasks that are supposedly hard for computers but easy for humans. Although image recognition models pose a significant threat to such CAPTCHAs, they can be fooled by hiding ``random'' noise in images. However, these methods are model-specific and thus can not aid CAPTCHAs in fooling all models.
We show in this work that by allowing for more significant changes to the images while preserving the semantic information and keeping it solvable by humans, we can fool many state-of-the-art models. Specifically, we demonstrate that by adding masks of various intensities the Top 1 Accuracy (Acc@1) drops by more than 50%-points for all models, and supposedly robust models such as vision transformers see an Acc@1 drop of 80%-points.
These masks can therefore effectively fool modern image classifiers, thus showing that machines have not caught up with humans -- yet.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13823
Loading