Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2025 Workshop R2-FM Submissions
A Statistical Physics of Language Model Reasoning
Jack David Carson
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
BiasGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing
Erica Wang
,
Shrujana S Kunnam
,
Sreeyutha Ratala
Published: 01 Jul 2025, Last Modified: 10 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors
Jennifer Za
,
Julija Bainiaksina
,
Tanush Chopra
,
Nikita Ostrovsky
,
Victoria Krakovna
Published: 01 Jul 2025, Last Modified: 10 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Model Organisms for Emergent Misalignment
Edward Turner
,
Anna Soligo
,
Mia Taylor
,
Senthooran Rajamanoharan
,
Neel Nanda
Published: 01 Jul 2025, Last Modified: 10 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Adversarial Manipulation of Reasoning Models using Internal Representations
Kureha Yamaguchi
,
Benjamin Etheridge
,
Andy Arditi
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
GPT, But Backwards: Exactly Inverting Language Model Outputs
Adrians Skapars
,
Edoardo Manino
,
Youcheng Sun
,
Lucas Carvalho Cordeiro
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
On the Scoring Functions for RAG-based Conformal Factuality
Yi Chen
,
Caitlyn Heqi Yin
,
Sukrut Madhav Chikodikar
,
Ramya Korlakai Vinayak
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
Miles Turpin
,
Andy Arditi
,
Marvin Li
,
Joe Benton
,
Julian Michael
Published: 01 Jul 2025, Last Modified: 10 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
The Geometries of Truth Are Orthogonal Across Tasks
Waïss Azizian
,
Michael Kirchhof
,
Eugene Ndiaye
,
Louis Béthune
,
Michal Klein
,
Pierre Ablin
,
marco cuturi
Published: 01 Jul 2025, Last Modified: 04 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
On Learning Verifiers for Chain-of-Thought Reasoning
Maria Florina Balcan
,
Avrim Blum
,
Zhiyuan Li
,
Dravyansh Sharma
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
,
Yufei Xu
,
Zhengyin Du
,
Xuesong Yao
,
Zheyu Wang
,
Xiaowen Guo
,
Jiecao Chen
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
,
Philip Torr
,
Fazl Barez
,
Veronika Thost
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Advancing LLM Safe Alignment with Safety Representation Ranking
Tianqi Du
,
Zeming Wei
,
Quan Chen
,
Chenheng Zhang
,
Yisen Wang
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Conformal Prediciton Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models
Sima Noorani
,
Shayan Kiyani
,
George J. Pappas
,
Hamed Hassani
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
When Meaning Doesn’t Matter: Exposing Guard Model Fragility via Paraphrasing
Cristina Pinneri
,
Christos Louizos
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
RoMa: A Robust Model Watermarking Scheme for Protecting IP in Diffusion Models
Yingsha Xie
,
Rui Min
,
Zeyu Qin
,
Fei Ma
,
Li Shen
,
Fei Yu
,
Xiaochun Cao
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Learning on LLM Output Signatures for Gray-Box Behavior Analysis
Guy Bar-Shalom
,
Fabrizio Frasca
,
Derek Lim
,
Yoav Gelberg
,
Yftah Ziser
,
Ran El-Yaniv
,
Gal Chechik
,
Haggai Maron
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Visual Language Models as Zero-Shot Deepfake Detectors
Viacheslav Pirogov
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification
Yuhao Sun
,
Jiacheng Zhang
,
Zesheng Ye
,
Chaowei Xiao
,
Feng Liu
Published: 01 Jul 2025, Last Modified: 06 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko
,
Mark Ibrahim
,
Kamalika Chaudhuri
,
Samuel J. Bell
Published: 01 Jul 2025, Last Modified: 09 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Semi-Nonnegative GPT: Towards Monosemantic representations
Junyi Li
,
Jinqi Liu
,
Qi Zhang
,
Yisen Wang
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study
Kai Ye
,
Tianyi Chen
,
Zhen Wang
Published: 01 Jul 2025, Last Modified: 05 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
,
Zhaocheng Zhu
,
Xuan Li
,
Mikhail Galkin
,
Xiao Feng
,
Sanmi Koyejo
,
Jian Tang
,
Bo Han
Published: 01 Jul 2025, Last Modified: 10 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Conformal Risk Minimization with Variance Reduction
Sima Noorani
,
Orlando Romero
,
Nicolo Dal Fabbro
,
Hamed Hassani
,
George J. Pappas
Published: 01 Jul 2025, Last Modified: 01 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Distilling Safe LLM Systems via Soft Prompts
Motasem Alfarra
,
Dana Kianfar
,
Cristina Pinneri
,
Christos Louizos
Published: 01 Jul 2025, Last Modified: 07 Jul 2025
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
«
‹
1
2
3
4
5
›
»