Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2025 Workshop R2-FM Submissions
Finetuning-Activated Backdoors in LLMs
Thibaud Gloaguen
,
Mark Vero
,
Robin Staab
,
Martin Vechev
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Fengqing Jiang
,
Fengbo Ma
,
Zhangchen Xu
,
Yuetai Li
,
Bhaskar Ramasubramanian
,
Luyao Niu
,
Bo Li
,
Xianyan Chen
,
Zhen Xiang
,
Radha Poovendran
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Defending Against Prompt Injection with a Few DefensiveTokens
Sizhe Chen
,
Yizhu Wang
,
Nicholas Carlini
,
Chawin Sitawarin
,
David Wagner
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Lifelong Safety Alignment for Language Models
Haoyu Wang
,
Zeyu Qin
,
Yifei Zhao
,
Chao Du
,
Min Lin
,
Xueqian Wang
,
Tianyu Pang
Published: 01 Jul 2025, Last Modified: 05 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Robust and Interpretable Relational Reasoning with Large Language Models and Symbolic Solvers
Ge Zhang
,
Mohammad Ali Alomrani
,
Hongjian Gu
,
Jiaming Zhou
,
Yaochen Hu
,
Bin Wang
,
Qun Liu
,
Mark Coates
,
Yingxue Zhang
,
Jianye HAO
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
LoRA Merging with SVD: Understanding Interference and Preserving Performance
Dennis Tang
,
Prateek Yadav
,
Yi-Lin Sung
,
Jaehong Yoon
,
Mohit Bansal
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang
,
Mingjie Li
,
Michael Backes
,
Yang Zhang
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Thought calibration: Efficient and confident test-time scaling
Menghua Wu
,
Cai Zhou
,
Stephen Bates
,
Tommi Jaakkola
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Lookahead Bias in Pretrained Language Models
Suproteem K Sarkar
,
Keyon Vafa
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
,
David Majercak
,
Xavier Fernandes
,
Richard G. Edgar
,
Blake Bullwinkel
,
Jingya Chen
,
Harsha Nori
,
Dean Carignan
,
Eric Horvitz
,
Forough Poursabzi-Sangdeh
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images
Aditya Kumar
,
Tom Blanchard
,
Adam Dziedzic
,
Franziska Boenisch
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang
,
Keda TAO
,
Jiasheng Tang
,
Huan Wang
Published: 01 Jul 2025, Last Modified: 05 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization
Joschka Braun
,
Carsten Eickhoff
,
Seyed Ali Bahrainian
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Dynamic Risk Assessments for Offensive Cybersecurity Agents
Boyi Wei
,
Benedikt Stroebl
,
Jiacen Xu
,
Joie Zhang
,
Zhou Li
,
Peter Henderson
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Capability-Based Scaling Laws for LLM Red-Teaming
Alexander Panfilov
,
Paul Kassianik
,
Maksym Andriushchenko
,
Jonas Geiping
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Jekaterina Novikova
,
Carol Myrick Anderson
,
Borhane Blili-Hamelin
,
Domenic Rosati
,
Subhabrata Majumdar
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
The Geometry of Forgetting: Analyzing Machine Unlearning through Local Learning Coefficients
Aashiq Muhamed
,
Virginia Smith
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
,
Jacopo Bonato
,
Mona T. Diab
,
Virginia Smith
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Jianyu Wang
,
Zhiqiang Hu
,
Lidong Bing
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
State Space Models: A Naturally Robust Alternative to Transformers in Computer Vision
Chengbin Du
,
Yanxi Li
,
Chang Xu
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
«
‹
1
2
3
4
5
›
»