Keywords: ai safety; ai robustness;
TL;DR: We analyze backdoor attacks from a functional perspective.
Abstract: Backdoor attacks embed hidden behaviors into neural networks, causing misclassification when specific triggers are present. While many backdoor methods differ in trigger design or poisoning strategy, they often share a common goal: mapping any triggered input to a fixed target label. This paper investigates whether such attacks lead to similar functional behavior. We introduce a framework to analyze backdoored models from a functional perspective, using metrics over both hard and soft predictions. Our study includes two aspects: (1) the consistency of each attack’s learned function across training runs, and (2) the functional similarity across different attack strategies. Results show that some attacks (e.g., FTrojanNN, SSBA) yield stable, convergent behavior, while others (e.g., WaNet, Input-Aware) are highly variable. Cross-attack comparisons reveal functional clusters, particularly among clean-label methods, while visible or training-controlled attacks deviate more sharply. These findings suggest that even with similar objectives, backdoor methods shape model functions in distinct ways, motivating function-level analysis as a tool for understanding or defending against neural backdoors.
Submission Number: 36
Loading