Who is In Charge? Dissecting Role Conflicts in LLM Instruction Following

Published: 22 Sept 2025, Last Modified: 03 Jan 2026WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Probing, Steering, AI Safety, instruction hierarchies, role conflicts
Submission Number: 106
Loading