Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search

Jingyu Li; Pengwen Dai; Mingqing Zhu; Chengwei Wang; Haolong Liu; Xiaochun Cao

Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search

Jingyu Li, Pengwen Dai, Mingqing Zhu, Chengwei Wang, Haolong Liu, Xiaochun Cao

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial attack, scene text recognition, evolution algorithm, computer vision

TL;DR: We propose a pixel-level black-box attack method to fool STR models to predict more incorrect characters, using a novel multi-population coevolution search algorithm.

Abstract: Recent work has shown that scene text recognition (STR) models are vulnerable to adversarial examples. Different from non-sequential vision tasks, the output sequence of STR models contains rich information. However, existing adversarial attacks against STR models can only lead to a few incorrect characters in the predicted text. These attack results still carry partial information about the original prediction and could be easily corrected by an external dictionary or a language model. Therefore, we propose the Multi-Population Coevolution Search (MPCS) method to attack each character in the image. We first decompose the global optimization objective into sub-objectives to solve the attack pixel concentration problem existing in previous attack methods. While this distributed optimization paradigm brings a new joint perturbation shift problem, we propose a novel coevolution energy function to solve it. Experiments on recent STR models show the superiority of our method. The code is available at \url{https://github.com/Lee-Jingyu/MPCS}.

Supplementary Material: zip

Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 27154

Loading