Detecting Backdoors Embedded in Ensembles

SeokHee Kim, Changhee Hahn

Published: 01 Jan 2024, Last Modified: 11 Apr 2025ICEIC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine learning has experienced significant growth in recent decades, and ensemble learning, a powerful technique within this field, has also shown notable progress. However, as the adoption of machine learning grows, security concerns have emerged, with backdoor attacks (a.k.a., Trojan attacks) being a prominent example. While several studies have investigated backdoor attacks in various domains, including neural networks, transfer learning, and federated learning, research on backdoor attacks specifically targeting ensemble learning remains scarce, despite the heightened vulnerability. In this paper, we propose the first detection method designed to combat backdoor attacks in ensemble learning. Our focus lies on the modification attack, a potent and easily implemented technique achieved through the injection of a trigger into the training dataset. To counter such attacks, we leverage a meticulously designed test ensemble and analyze the magnitude of feature vectors to discern the benign nature of input models. Our approach effectively overcomes the limitations encountered by existing defenses when confronting backdoor attacks in ensemble learning, such as the dependence on clean datasets used for training the input models and practical cost considerations. We demonstrate that our scheme simultaneously achieves the aforementioned objectives and exhibits robust performance against advanced attacks.