RobustEncoder: Leveraging K-Means clustering technique to defend NLP models against backdoor attacks

Luay Albtosh; Marwan Omar; Jamal N. Al-Karaki; Derek Mohammed; Hewa Majeed Zangana

RobustEncoder: Leveraging K-Means clustering technique to defend NLP models against backdoor attacks

Luay Albtosh, Marwan Omar, Jamal N. Al-Karaki, Derek Mohammed, Hewa Majeed Zangana

Published: 01 Jan 2024, Last Modified: 12 May 2025BCCA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As machine learning (ML) systems become increasingly integrated into real-world applications for sensitive tasks, ensuring the security and privacy of these models becomes paramount. Deep Neural Networks (DNNs), in particular, are susceptible to backdoor attacks, where adversaries manipulate training data by inserting specially crafted samples. While the NLP community has extensively studied backdoor attacks, there remains a gap in effective defense mechanisms. To address this, we propose RobustEncoder, a novel approach leveraging K-Means clustering to detect and mitigate backdoor attacks in text-based models. Our method demonstrates significant efficacy in identifying and neutralizing backdoor triggers, as evidenced by extensive empirical evaluations. Additionally, we propose potential applications of blockchain technology to further enhance the security and integrity of the defense mechanisms in future implementations.

Loading