Keywords: safety, societal implications, multimodality
TL;DR: Towards Safer Language Models on Visually Perturbed Texts
Abstract: Visual text perturbations are increasingly used to bypass content moderation systems, where characters are replaced with visually similar Unicode alternatives that humans can easily recognize but text-only filters fail to detect. While existing research has examined the generation and classification of such evasion techniques, the critical task of restoration remains underexplored. To address this challenge, we present GlyphDecode, a novel framework designed to restore visually perturbed text to its original form. Our framework consists of two key components: (1) GlyphPerturber, which generates visually perturbed text images for training, and (2) GlyphRestorer, which learns to recover the original text through a multimodal transformer architecture. GlyphRestorer is a light-weight and fast module that can be applied in a plug-and-play manner with off-the-shelf LLMs and multimodal LLMs to enhance harmful content detection. To evaluate restoration efficacy in real-world scenarios, we introduce GlyphSynth publicly available, a specialized dataset containing realistic examples of content moderation evasion from diverse sources including DEA(Drug Enforcement Administration) reports and social media platforms. Experimental results demonstrate that our approach significantly outperforms baselines in text restoration, and enabling multimodal language models to better detect harmful content disguised through visual manipulations. Our work bridges an important gap in content moderation systems by addressing not only the detection but also the recovery of manipulated text, contributing to more effective safeguards against increasingly sophisticated evasion tactics.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Flagged For Ethics Review: true
Ethics Comments: I think the authors may have built a model that is perfect for solving CAPTCHAs. I don't think this should qualify as a barrier to acceptance – this is a good paper! – but I think it should be mentioned in the Ethics statement.
Submission Number: 761
Loading