Reproducibility Study on Adversarial Attacks against Robust Transformer Trackers — Supplementary Material

This document provides additional results and analysis for our study in the main paper.

Table of Contents

  1. Bounding box vs. Binary mask (experiment 1, section 4.1)
  2. Perturbation level shifts: White-box attacks (experiment 2, section 4.2)
  3. Perturbation level shifts: Black-box attack (experiment 3, section 4.3)

1. Bounding box vs. Binary mask

In the first experiment, we applied the adversarial attacks aginst TransT-SEG and MixFormerM, and as a result, we created a video of the output of the tracker before (Green Mask/BBOX) and after the attack (Red Mask/BBOX) .

The white-box attacks are more effective against TransT tracker whether the evaluation is based on the bounding box or the binary mask .

Black-box attacks against TransT-SEG

White-box attacks against TransT-SEG

Black-box attacks against MixFormerM

2. Perturbation level shifts: White-box attacks

In this section, we applied the adversarial attacks aginst TransT, and as a result, we created a series of video using the perturbed search regions and perturbation maps in different perturbation levels for the white-box approaches: SPARK and RTAA. The search regions after the attack may show different areas of the same frame, depending on the effect of each attack and bounding box degradation.

Any perturbed region with SSIM lower than 50% is considered as a super-perturbed region. In lower perturbation levels, the perceptibility of the generated perturbations are greater while in higher levels, the number of super-perturbed frames are inscreased.

Perturbed search regions and Perturbation maps: ε = 2.55

Perturbed search regions and Perturbation maps: ε = 5.1

Perturbed search regions and Perturbation maps: ε = 10.2

Perturbed search regions and Perturbation maps: ε = 20.4

Perturbed search regions and Perturbation maps: ε = 40.8

3. Perturbation level shifts: Black-box attack

We have created video sequences by using the original tracking sequences as a base. These videos are generated by attacking the ROMTrack tracker with IoU method in different levels of the perturbation.

.

Perturbed Frame: ζ = 8k

Perturbed Frame: ζ = 10k

Perturbed Frame: ζ = 12k

Perturbation Map: ζ = 8k

Perturbation Map: ζ = 10k

Perturbation Map: ζ = 12k