Abstract: Adversarial robustness evaluates the worst-case
performance scenario of a machine learning
model to ensure its safety and reliability. For
example, cases where the user input contains a
minimal change, e.g. a synonym, which causes
the previously correct model to return a wrong
answer. Using this scenario, this study is the
first to investigate the robustness of visually
grounded dialog models towards textual attacks.
We first aim to understand how multimodal input components contribute to model robustness.
Our results show that models which encode
dialog history are more robust by providing redundant information. This is in contrast to prior
work which finds that dialog history is negligible for model performance on this task. We
also evaluate how to generate adversarial test
examples which successfully fool the model
but remain undetected by the user/software designer. Our analysis shows that the textual, as
well as the visual context are important to generate plausible attacks.
Loading