Adversarial Robustness via Class Partitioning

Raj Gaurav Tiwari; Andrew Thangaraj

Adversarial Robustness via Class Partitioning

Raj Gaurav Tiwari, Andrew Thangaraj

Published: 01 Jan 2023, Last Modified: 11 Oct 2024NCC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Classifiers based on Deep Neural Networks (DNNs) are vulnerable to adversarial attacks. To protect DNNs from adversarial attacks, recent research suggests the use of a diverse ensemble of models and combining their outputs. While this provides defence under adversarial input, the total number of parameters in the ensemble is significantly higher when compared to a single network. In this work, we propose a new method for designing and training model ensembles that provides equivalent protection with much fewer parameters. The main idea is to partition the classes into groups and train a shared part in every neural network of the ensemble to separate the classes but cluster groups together by using a carefully designed loss function. The rest of the neural network is trained to classify smaller sets of training classes containing one class from each group. The training classes are varied for different models to achieve diversity. Evaluations with the MNIST and CIFAR10 data sets confirm the effectiveness of our approach when compared with other existing approaches.

Loading