DP-FDF: A Dual-path Fuzzy Decision Framework For Intent Judgments In Large Language Models

ACL ARR 2026 January Submission160 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Jailbreak Attacks, Intent Determination, Fuzzy Mathematics
Abstract: With the widespread adoption of large language models (LLM) across diverse application scenarios, accurately identifying potential malicious intent in user inputs—such as boundary probing, disguised requests, direct attacks, and prompt injection—has become critical to ensuring their security. Current mainstream LLMs exhibit limited capabilities in recognizing semantically ambiguous features like disguised or euphemistic expressions. We propose a novel dual-path fuzzy decision framework (DP-FDF) designed to significantly enhance LLM intent recognition in ambiguous semantic contexts. This framework pioneers the integration of fuzzy mathematics theory into LLM security defense. It constructs a comprehensive evaluation mechanism that combines fuzzy feature similarity paths with Max–Min fuzzy inference paths to score input statements across multiple dimensions. The final judgment is derived through a weighted fusion and refined two-stage decision strategy. Through experimental testing on multiple mainstream LLMs, DP-FDF significantly reduces the average attack success rate (ASR) from 76.58% in an unprotected state to 12.80%, fully demonstrating the framework's performance and versatility.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Large Language Models, Jailbreak Attacks
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 160
Loading