A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks.

Blake Bullwinkel, Mark Russinovich, Ahmed Salem 0001, Santiago Zanella-Béguelin, Daniel Jones, Giorgio Severi, Eugenia Kim, Keegan Hines, Amanda J. Minnich, Yonatan Zunger, Ram Shankar Siva Kumar

26 Jan 2026CoRR 2025EveryoneCC BY-SA 4.0
Loading