OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Wai Man Si
PhD student, CISPA, saarland university, saarland informatics campus
Joined
June 2023
Names
Wai Man Si
(Preferred)
,
Wai man Si
Emails
****@cispa.de
(Confirmed)
Personal Links
Google Scholar
DBLP
Semantic Scholar
ACL Anthology
Career & Education History
PhD student
CISPA, saarland university, saarland informatics campus
(cispa.saarland)
2021
–
2026
MS student
Georgia Institute of Technology
(gatech.edu)
2019
–
2021
Undergrad student
Georgia Institute of Technology
(gatech.edu)
2015
–
2018
Advisors, Relations & Conflicts
PhD Advisor
Michael Backes
Present
PhD Advisor
Yang Zhang
Present
Expertise
No areas of expertise listed
Publications
Excessive Reasoning Attack on Reasoning LLMs
Wai Man Si
,
Mingjie Li
,
Michael Backes
,
Yang Zhang
Submitted to ICLR 2026
Readers:
Everyone
Boosting Safety Alignment in LLMs with Response Shortcuts
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang
Submitted to ICLR 2026
Readers:
Everyone
Excessive Reasoning Attack on Reasoning LLMs
Wai Man Si
,
Mingjie Li
,
Michael Backes
,
Yang Zhang
CoRR 2025
Readers:
Everyone
Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang
NeurIPS 2025 poster
Readers:
Everyone
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang
ICLR 2025
Readers:
Everyone
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang
ICLR 2025 Poster
Readers:
Everyone
ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Wai Man Si
,
Michael Backes
,
Yang Zhang
CoRR 2024
Readers:
Everyone
Two-in-One: A Model Hijacking Attack Against Text Generation Models
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Ahmed Salem
CoRR 2023
Readers:
Everyone
Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Wai Man Si
,
Michael Backes
,
Yang Zhang
CoRR 2023
Readers:
Everyone
View all 16 publications
Co-Authors
Ahmed Salem
Boyang Zhang
Emiliano De Cristofaro
Gianluca Stringhini
Jeremy Blackburn
Mark O. Riedl
Mark Riedl
Michael Backes
Mingjie Li
Prithviraj Ammanabrolu
Savvas Zannettou
Xinyue Shen
Yang Zhang
Yisen Wang
Yun Shen
Zeyang Sha
Zeyuan Chen