Abstract: Deep learning models are robust to classify complex inputs with high accuracy. However, as these models automatically select the important input features based on the training data, there is no assurance that the right input information drives the model inference process. Different techniques guide the training model process to focus on the right features for the problem. These methods usually minimize the input gradients of the non-important features dimension, forcing the model to use signal features and be right for the right reasons. However, some tasks have a bias in their signal features, so if the model learns to focus on it, the model will be biased by the signal bias. In addition, these strategies expose the important features to attacks because the input gradients of the important features have a high norm. In this work, we propose a new loss function that jointly teaches the model to be right for the right reasons and be adversarial robust. We evaluate the proposed approach with two categories of problems: texture-based and structure-based. The proposed method presented SOTA results in the structure-based problems and competitive results in the texture-based ones.
Loading