Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2025 Workshop R2-FM Submissions
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns
Fred Heiding
,
Simon Lermen
,
Andrew Kao
,
Bruce Schneier
,
Arun Vishwanath
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Position: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models
Muxing Li
,
Zesheng Ye
,
Yixuan Li
,
Andy Song
,
Guangquan Zhang
,
Feng Liu
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs
Xun Wang
,
Jing Xu
,
Franziska Boenisch
,
Michael Backes
,
Christopher A. Choquette-Choo
,
Adam Dziedzic
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Improving Commonsense Reasoning and Reliability in LLMs Through Cognitive-Inspired Prompting Frameworks
Tanvi Ganapathy
,
Ishita Mathur
,
Anna Szczuka
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Uncertainty Quantification for MLLMs
Gregory Kang Ruey Lau
,
Hieu Dao
,
Nicole Kan Hui Lin
,
Bryan Kian Hsiang Low
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Position: Reasoning LLMs are Wandering Solution Explorers
Jiahao Lu
,
Ziwei Xu
,
Mohan Kankanhalli
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
Yuhao Wang
,
Wenjie Qu
,
Yanze Jiang
,
Lichen Liu
,
Yue Liu
,
Shengfang Zhai
,
Yinpeng Dong
,
Jiaheng Zhang
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai
,
Gauri Pradhan
,
Marlon Tobaben
,
Antti Honkela
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy
Jiacheng Zhang
,
Benjamin I. P. Rubinstein
,
Jingfeng Zhang
,
Feng Liu
Published: 01 Jul 2025, Last Modified: 06 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Accountability Attribution: Tracing Model Behavior to Training Processes
Shichang Zhang
,
Hongzhe Du
,
Karim Saraipour
,
Jiaqi W. Ma
,
Himabindu Lakkaraju
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Data Shifts Hurt CoT: A Theoretical Study
Lang Yin
,
Debangshu Banerjee
,
Gagandeep Singh
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
Jiali Cheng
,
Hadi Amiri
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Towards Secure Model Sharing with Approximate Fingerprints
Anshul Nasery
,
Sewoong Oh
Published: 01 Jul 2025, Last Modified: 06 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Kaiwen Zhou
,
Chengzhi Liu
,
Xuandong Zhao
,
Shreedhar Jangam
,
Jayanth Srinivasa
,
Gaowen Liu
,
Dawn Song
,
Xin Eric Wang
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation
Running Yang
,
Wenlong Deng
,
Minghui Chen
,
Yuyin Zhou
,
Xiaoxiao Li
Published: 01 Jul 2025, Last Modified: 08 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs
Jonas B Raedler
,
Weiyue Li
,
Alyssa Mia Taliotis
,
Manasvi Goyal
,
Siddharth Swaroop
,
Weiwei Pan
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making
Bharadwaj Ravichandran
,
David Joy
,
Paul Elliott
,
Brian H Hu
,
Jadie Adams
,
Christopher Funk
,
Emily Veenhuis
,
Anthony Hoogs
,
Arslan Basharat
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Circuit Discovery Helps To Detect LLM Jailbreaking
Paria Mehrbod
,
Boris Knyazev
,
Eugene Belilovsky
,
Guy Wolf
,
geraldin nanfack
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Predicting the Performance of Black-box Language Models with Follow-up Queries
Dylan Sam
,
Marc Anton Finzi
,
J Zico Kolter
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Foundational Models Must Be Designed To Yield Safer Loss Landscapes That Resist Harmful Fine-Tuning
Karan Uppal
,
Pavan Kalyan Tankala
Published: 01 Jul 2025, Last Modified: 05 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Yichi Zhang
,
Zihao Zeng
,
Dongbai Li
,
Yao Huang
,
Zhijie Deng
,
Yinpeng Dong
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Kaiwen Zhou
,
Xuandong Zhao
,
Gaowen Liu
,
Jayanth Srinivasa
,
Aosong Feng
,
Dawn Song
,
Xin Eric Wang
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Kartik Sharma
,
Yiqiao Jin
,
Vineeth Rakesh
,
Yingtong Dou
,
Menghai Pan
,
Mahashweta Das
,
Srijan Kumar
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
,
Alexander Robey
,
Changliu Liu
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
In-Context Watermarks for Large Language Models
Yepeng Liu
,
Xuandong Zhao
,
Christopher Kruegel
,
Dawn Song
,
Yuheng Bu
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
«
‹
1
2
3
4
5
›
»