Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Abstract: To insert Trojan into a Deep Neural Network (DNN),
the existing attack assumes the attacker can access the victim’s training facilities. However, a realistic threat model
was recently developed by leveraging memory fault to inject Trojans at the inference stage. In this work, we develop a novel Trojan attack by adopting a unique memory fault injection technique that can inject bit-flip into
the page table of the main memory. In the main memory, each weight block consists of a group of weights located at a specific address of a DRAM row. A bit-flip in
the page frame number replaces a target weight block of
a DNN model with another replacement weight block. To
develop a successful Trojan attack leveraging this unique
fault model, the attacker must solve three key challenges:
i) how to identify a minimum set of target weight blocks
to be modified? ii) how to identify the corresponding optimal replacement weight block? iii) how to optimize the
trigger to maximize the attacker’s objective given a target and replacement weight block set? We address them
by proposing a novel Deep-TROJ attack algorithm that can
identify a minimum set of vulnerable target and corresponding replacement weight blocks while optimizing the trigger at the same time. We evaluate the performance of our
proposed Deep-TROJ on CIFAR-10, CIFAR-100, and ImageNet dataset for fifteen different DNN architectures, including vision transformers. Proposed Deep-TROJ is the
most successful one to date that does not require access to
training facilities while successfully bypassing the existing
defenses. Our code is available at https://github.com/MLSecurity-Research-LAB/Deep-TROJ.
Loading