Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
,
Olivia Watkins
,
Ethan Adrian Mendes
,
Justin Svegliato
,
Luke Bailey
,
Tiffany Wang
,
Isaac Ong
,
Karim Elmaaroufi
,
Pieter Abbeel
,
Trevor Darrell
,
Alan Ritter
,
Stuart Russell
Published: 01 Jan 2024, Last Modified: 13 May 2025
ICLR 2024
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading