JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

Published: 2025, Last Modified: 12 Jan 2026USENIX Security Symposium 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading