OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Tony T. Wang
PhD student, Massachusetts Institute of Technology
Joined
May 2021
Names
Tony T. Wang
(Preferred)
,
Tony Tong Wang
,
Tony T Wang
Emails
****@gmail.com
(Confirmed)
,
****@mit.edu
(Confirmed)
Personal Links
Homepage
Google Scholar
Career & Education History
PhD student
Massachusetts Institute of Technology
(mit.edu)
2021
–
Present
MS student
Massachusetts Institute of Technology
(mit.edu)
2020
–
2021
Undergrad student
Massachusetts Institute of Technology
(mit.edu)
2016
–
2020
Advisors, Relations & Conflicts
No relations added
Expertise
AI safety
2021
–
Present
Adversarial robustness
2019
–
Present
Theory of deep learning
2020
–
2022
Publications
Scalable Energy-Based Models via Adversarial Training: Unifying Discrimination and Generation
Xuwang Yin
,
Claire Zhang
,
Julie Steele
,
Nir N Shavit
,
Tony T. Wang
ICLR 2026 Poster
Readers:
Everyone
Learning to Interpret Weight Differences in Language Models
Avichal Goel
,
Yoon Kim
,
Nir N Shavit
,
Tony T. Wang
ICLR 2026 Poster
Readers:
Everyone
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer
,
Dan Valentine
,
Luke Bailey
,
James Chua
,
Cristobal Eyzaguirre
,
Zane Durante
,
Joe Benton
,
Brando Miranda
,
Henry Sleight
,
Tony Tong Wang
,
John Hughes
,
Rajashree Agrawal
,
Mrinank Sharma
,
Scott Emmons
,
Sanmi Koyejo
,
Ethan Perez
ICLR 2025 Poster
Readers:
Everyone
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer
,
Dan Valentine
,
Luke Bailey
,
James Chua
,
Zane Durante
,
Cristobal Eyzaguirre
,
Joe Benton
,
Brando Miranda
,
Henry Sleight
,
Tony Tong Wang
,
John Hughes
,
Rajashree Agrawal
,
Mrinank Sharma
,
Scott Emmons
,
Sanmi Koyejo
,
Ethan Perez
Red Teaming GenAI Workshop @ NeurIPS'24 Oral
Readers:
Everyone
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Rylan Schaeffer
,
Dan Valentine
,
Luke Bailey
,
James Chua
,
Zane Durante
,
Cristobal Eyzaguirre
,
Joe Benton
,
Brando Miranda
,
Henry Sleight
,
Tony Tong Wang
,
John Hughes
,
Rajashree Agrawal
,
Mrinank Sharma
,
Scott Emmons
,
Sanmi Koyejo
,
Ethan Perez
SoLaR Spotlight
Readers:
Everyone
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers
Tony Tong Wang
,
John Hughes
,
Henry Sleight
,
Rylan Schaeffer
,
Rajashree Agrawal
,
Fazl Barez
,
Mrinank Sharma
,
Jesse Mu
,
Nir N Shavit
,
Ethan Perez
SoLaR Poster
Readers:
Everyone
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers
Tony Tong Wang
,
John Hughes
,
Henry Sleight
,
Rylan Schaeffer
,
Rajashree Agrawal
,
Fazl Barez
,
Mrinank Sharma
,
Jesse Mu
,
Nir N Shavit
,
Ethan Perez
AdvML-Frontiers 2024
Readers:
Everyone
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Rylan Schaeffer
,
Dan Valentine
,
Luke Bailey
,
James Chua
,
Cristobal Eyzaguirre
,
Zane Durante
,
Joe Benton
,
Brando Miranda
,
Henry Sleight
,
Tony Tong Wang
,
John Hughes
,
Rajashree Agrawal
,
Mrinank Sharma
,
Scott Emmons
,
Sanmi Koyejo
,
Ethan Perez
NeurIPS 2024 Workshop RBFM Oral
Readers:
Everyone
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Rylan Schaeffer
,
Dan Valentine
,
Luke Bailey
,
James Chua
,
Cristobal Eyzaguirre
,
Zane Durante
,
Joe Benton
,
Brando Miranda
,
Henry Sleight
,
Tony Tong Wang
,
John Hughes
,
Rajashree Agrawal
,
Mrinank Sharma
,
Scott Emmons
,
Sanmi Koyejo
,
Ethan Perez
AdvML-Frontiers 2024
Readers:
Everyone
Can Go AIs be adversarially robust?
Tom Tseng
,
Euan McLean
,
Kellin Pelrine
,
Tony Tong Wang
,
Adam Gleave
NextGenAISafety 2024 Poster
Readers:
Everyone
View all 17 publications
Co-Authors
Adam Gleave
Alexander Wei
Anand Siththaranjan
Anca Dragan
Andi Peng
Avichal Goel
Brando Miranda
Charbel-Raphael Segerie
Claire Zhang
Claudia Shi
Cristobal Eyzaguirre
Dan Valentine
Danny Halawi
David Krueger
David Lindner
Dmitrii Krasheninnikov
Dorsa Sadigh
Dylan Hadfield-Menell
Erdem Biyik
Eric J Michaud
Eric Wallace
Ethan Perez
Euan McLean
Fazl Barez
Henry Sleight
View all 73 co-authors