The Blind Spot of LLM Security: Time-Sensitive Backdoors Activated by Inherent Features

The Blind Spot of LLM Security: Time-Sensitive Backdoors Activated by Inherent Features

ICLR 2026 Conference Submission153 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models; Pre-trained models; Backdoor Attacks; Future Timestamps; System Prompts; AI Security

TL;DR: This paper introduces TempBackdoor, a novel backdoor attack framework that leverages future timestamps in system prompts as stealthy triggers to poison Large Language Models.

Abstract: With the widespread adoption of Large Language Models (LLMs), backdoor attacks against pre-trained LLMs have become a notable security issue. Without control over end-user inputs, the trigger conditions of existing attacks are difficult to satisfy. To address this limitation, we introduce TempBackdoor, a novel time-sensitive backdoor attack framework. TempBackdoor exploits timestamp features embedded in system prompts as its dynamic trigger, enabling precise, long-term dormant attacks without requiring control over end-user inputs. To implement this complex attack, we develop an efficient, automated pipeline comprising Homo-Poison, an automated data-poisoning method based on homogeneous models, and a hybrid training strategy that combines supervised fine-tuning (SFT) with n-token reinforcement learning (n-token RL). The n-token RL variant is specifically designed for precise poisoning tasks and is instrumental for the efficient and accurate implantation of time-based backdoors. Our experiments show that TempBackdoor achieves over 96% attack success rate (ASR) and less than 2% false positive rate (FPR) in three scenarios on the Qwen/Qwen2.5-7B-Instruct model and successfully bypasses seven mainstream defenses. Critically, this work not only demonstrates the viability of leveraging a model's endogenous features as an attack vector (as opposed to external injections) but also uncovers a key blind spot in current backdoor threat models for evaluating such advanced threats.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 153

Loading