OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Victoria Krakovna
Research scientist, Google DeepMind
Joined
September 2016
Names
Victoria Krakovna
(Preferred)
,
Viktoriya Krakovna
Emails
****@fas.harvard.edu
(Confirmed)
,
****@gmail.com
(Confirmed)
,
****@google.com
(Confirmed)
,
****@deepmind.com
(Confirmed)
Personal Links
Homepage
Google Scholar
DBLP
LinkedIn
Semantic Scholar
Career & Education History
Research scientist
Google DeepMind
(deepmind.google)
2016
–
Present
PhD student
Harvard University
(harvard.edu)
2011
–
2016
Advisors, Relations & Conflicts
Coauthor
Martin Ciesielski-Listwan
2026
–
2026
Coauthor
Alyssia Jovellanos
2025
–
2026
Coauthor
Raymond Douglas
2023
–
2024
Coauthor
Jacek Karwowski
2023
–
2024
Coauthor
Evan Ryan Gunter
2023
–
2024
Coauthor
Chan Bae
2023
–
2024
Coauthor
Matthew Aitchison
2022
–
2024
Coauthor
Ramana Kumar
2019
–
2024
Expertise
AI safety
,
AI alignment
,
specification gaming
,
reward design
,
goal misgeneralization
,
dangerous capability evaluations
,
power-seeking
,
side effects
,
deceptive alignment
,
scheming
Present
Publications
AutoHoney: Automating, Deploying, and Evaluating Scheming Honeypots Across Production Codebases
Martin Ciesielski-Listwan
,
Alyssia Jovellanos
,
Victoria Krakovna
ICML 2026 AIWILD
Readers:
Everyone
Persuasion Attacks Can Decrease Effectiveness of CoT Monitoring
Jennifer Za
,
Julija Bainiaksina
,
Nikita Ostrovsky
,
Tanush Chopra
,
Victoria Krakovna
ICLR 2026 AIWILD
Readers:
Everyone
Evaluating AI Agent Persuasion of Safety Monitors
Jennifer Za
,
Julija Bainiaksina
,
Nikita Ostrovsky
,
Tanush Chopra
,
Victoria Krakovna
WiML @ NeurIPS 2025
Readers:
Everyone
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Tomek Korbak
,
Mikita Balesni
,
Elizabeth Barnes
,
Yoshua Bengio
,
Joe Benton
,
Joseph Bloom
,
Mark Chen
,
Alan Cooney
,
Allan Dafoe
,
Anca D. Dragan
,
Scott Emmons
,
Owain Evans
,
David Farhi
,
Ryan Greenblatt
,
Dan Hendrycks
,
Marius Hobbhahn
,
Evan Hubinger
,
Geoffrey Irving
,
Erik Jenner
,
Daniel Kokotajlo
et al. (21 additional authors not shown)
CoRR 2025
Readers:
Everyone
Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors
Jennifer Za
,
Julija Bainiaksina
,
Tanush Chopra
,
Nikita Ostrovsky
,
Victoria Krakovna
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Evaluating Frontier Models for Stealth and Situational Awareness
Mary Phuong
,
Roland S. Zimmermann
,
Ziyue Wang
,
David Lindner
,
Victoria Krakovna
,
Sarah Cogan
,
Allan Dafoe
,
Lewis Ho
,
Rohin Shah
Submitted to NeurIPS 2025
Readers:
Everyone
Evaluating Frontier Models for Stealth and Situational Awareness
Mary Phuong
,
Roland S. Zimmermann
,
Ziyue Wang
,
David Lindner
,
Victoria Krakovna
,
Sarah Cogan
,
Allan Dafoe
,
Lewis Ho
,
Rohin Shah
CoRR 2025
Readers:
Everyone
An Approach to Technical AGI Safety and Security
Rohin Shah
,
Alex Irpan
,
Alexander Matt Turner
,
Anna Wang
,
Arthur Conmy
,
David Lindner
,
Jonah Brown-Cohen
,
Lewis Ho
,
Neel Nanda
,
Raluca Ada Popa
,
Rishub Jain
,
Rory Greig
,
Samuel Albanie
,
Scott Emmons
,
Sebastian Farquhar
,
Sébastien Krier
,
Senthooran Rajamanoharan
,
Sophie Bridgers
,
Tobi Ijitoye
,
Tom Everitt
et al. (10 additional authors not shown)
CoRR 2025
Readers:
Everyone
Limitations of Agents Simulated by Predictive Models
Raymond Douglas
,
Jacek Karwowski
,
Chan Bae
,
Andis Draguns
,
Victoria Krakovna
LLMAgents @ ICLR 2024 Poster
Readers:
Everyone
The Ethics of Advanced AI Assistants
Iason Gabriel
,
Arianna Manzini
,
Geoff Keeling
,
Lisa Anne Hendricks
,
Verena Rieser
,
Hasan Iqbal
,
Nenad Tomasev
,
Ira Ktena
,
Zachary Kenton
,
Mikel Rodriguez
,
Seliem El-Sayed
,
Sasha Brown
,
Canfer Akbulut
,
Andrew Trask
,
Edward Hughes
,
A. Stevie Bergman
,
Renee Shelby
,
Nahema Marchal
,
Conor Griffin
,
Juan Mateos-Garcia
et al. (37 additional authors not shown)
CoRR 2024
Readers:
Everyone
View all 28 publications
Co-Authors
A. Stevie Bergman
Alan Cooney
Albert Webson
Aleksander Madry
Alex Ingerman
Alex Irpan
Alexander Matt Turner
Alexander Reese
Alexandre Kaskasoli
Alison Lentz
Allan Dafoe
Alyssia Jovellanos
Amanda McCroskery
Anca D. Dragan
Andis Draguns
Andrew Barakat
Andrew Lefrancq
Andrew Trask
Anian Ruoss
Anna Wang
Arianna Manzini
Arthur Conmy
Benjamin Lange
Beth Goldberg
Blaise Agüera y Arcas
View all 153 co-authors