CCFC: Core & Core–Full–Core Dual-Track Defense for LLM Jailbreak Protection

CCFC: Core & Core–Full–Core Dual-Track Defense for LLM Jailbreak Protection

ACL ARR 2026 January Submission6088 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Safety

Abstract: Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core & Core–Full–Core), a dual-track, prompt-level defense framework designed to mitigate LLMs' vulnerabilities from prompt injection and structure-aware jailbreak attacks. CCFC operates by first isolating the semantic core of a user query via few-shot prompting, and then evaluating the query using two complementary tracks: a core-only track to ignore adversarial distractions (e.g., toxic suffixes or prefix injections), and a core-full-core (CFC) track to disrupt the structural patterns exploited by gradient-based or edit-based attacks. The final response is selected based on a safety consistency check across both tracks, ensuring robustness without compromising on response quality. We demonstrate that on both open-source and closed-source large language models, CCFC consistently drives attack success rates of diverse, strong jailbreak techniques (e.g., DeepInception, GCG) down to nearly zero, with only a modest runtime overhead and no sacrifice of fidelity on benign queries. Our method consistently outperforms state-of-the-art prompt-level defenses, offering a practical and effective solution for safer LLM deployment.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Jailbreaking

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 6088

Loading