TL;DR: Via persistent homology, we introduce two novel topological-contrastive losses detecting topological disruptions in image-text embeddings of adversaries in multimodal machine learning, and utilize these signatures for improved adversarial detection.
Abstract: Multimodal Machine Learning systems, particularly those aligning text and image data like CLIP/BLIP models, have become increasingly prevalent, yet remain susceptible to adversarial attacks. While substantial research has addressed adversarial robustness in unimodal contexts, defense strategies for multimodal systems are underexplored. This work investigates the topological signatures that arise between image and text embeddings and shows how adversarial attacks disrupt their alignment, introducing distinctive signatures. We specifically leverage persistent homology and introduce two novel Topological-Contrastive losses based on Total Persistence and Multi-scale kernel methods to analyze the topological signatures introduced by adversarial perturbations. We observe a pattern of monotonic changes in the proposed topological losses emerging in a wide range of attacks on image-text alignments, as more adversarial samples are introduced in the data. By designing an algorithm to back-propagate these signatures to input samples, we are able to integrate these signatures into Maximum Mean Discrepancy tests, creating a novel class of tests that leverage topological signatures for better adversarial detection.
Lay Summary: Multimodal machine learning systems—such as those that combine text and images (like CLIP or BLIP)—are becoming popular but remain vulnerable to deceptive (adversarial) attacks. While many studies have addressed attacks on single-mode systems (only images or only text), defending multimodal systems is less understood. This paper explores how adversarial attacks alter the relationships between image and text representations, leaving unique patterns called "topological signatures." Using a mathematical technique called persistent homology, we introduce new methods, based on our proposed Topological-Contrastive losses, that measure these distinctive patterns. We found that adversarial attacks consistently cause predictable changes in these topological patterns. Additionally, by tracking these signatures back to the original input data, we developed a new approach to detect adversarial samples in batch.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Social Aspects->Security
Keywords: Multimodal Adversarial Detection, Persistent homology, Text-image alignment, Maximum Mean Discrepancy Test
Submission Number: 5025
Loading