Trigger Inversion for Code LLM Backdoor: Is adversarial optimization all you need?

Trigger Inversion for Code LLM Backdoor: Is adversarial optimization all you need?

ACL ARR 2025 May Submission7580 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rise of code-generation large language models (LLMs) has revolutionized software development by significantly enhancing productivity. However, their reliance on extensive datasets collected from open-source repositories exposes them to backdoor attacks, wherein malicious actors inject poisoned data to manipulate the generated code. These attacks pose serious security risks by embedding vulnerable code snippets into software applications. Existing research primarily focuses on designing stealthy backdoor attacks, leaving a gap in effective defenses. In this paper, we investigate trigger inversion as a defense mechanism for safeguarding code-generation LLMs. Trigger inversion aims to identify the adversary-defined input patterns (triggers) that activate malicious behavior in backdoored models. We study the effectiveness of two representative adversarial optimization-based inversion algorithms originally developed for general LLMs. Our experiments show that these methods can successfully recover triggers under specific settings in backdoored code LLMs. However, we also observe that inversion effectiveness is highly sensitive to factors such as suffix length and initialization, and that lower loss does not always correlate with successful trigger recovery. These findings highlight the limitations of existing approaches and underscore the urgent need for more robust and generalizable trigger inversion techniques tailored specifically for the code domain.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: security and privacy; safety and alignment; robustness; code models

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study

Languages Studied: Python, English

Submission Number: 7580

Loading