Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2025 Workshop MoFA Submissions
Mimicking Human Intuition: Cognitive Belief-Driven Reinforcement Learning
Xingrui Gu
,
Guanren Qiao
,
Chuyi Jiang
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
ReDit: Reward Dithering for Improved LLM Policy Optimization
Chenxing Wei
,
Jiarui Yu
,
Ying Tiffany He
,
Hande Dong
,
Yao Shu
,
Fei Yu
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Unanchoring the Mind: DAE-Guided Counterfactual Reasoning for Rare Disease Diagnosis
Yuting Yan
,
Yinghao Fu
,
Wendi Ren
,
Shuang Li
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
,
Hongyi Zhou
,
Jin Zhu
,
Francesco Quinzan
,
Chengchun Shi
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Chengcan Wu
,
Zhixin Zhang
,
Zeming Wei
,
Yihao Zhang
,
Meng Sun
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Vertical Moral Growth: A Novel Developmental Framework for Human Feedback Quality in AI Alignment
Taichiro Endo
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Alignment as Distribution Learning: Your Preference Model is Explicitly a Language Model
Jihun Yun
,
Juno Kim
,
Jongho Park
,
Junhyuck Kim
,
Jongha Jon Ryu
,
Jaewoong Cho
,
Kwang-Sung Jun
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
BiasLab: Toward Explainable Political Bias Detection with Dual-Axis Human Annotations and Rationale Indicators
KMA SOLAIMAN
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Dynamic Guardian Models: Realtime Content Moderation With User-Defined Policies
Monte Hoover
,
Vatsal Baherwani
,
Neel Jain
,
Khalid Saifullah
,
Joseph James Vincent
,
Chirag Jain
,
Melissa Kazemi Rad
,
C. Bayan Bruss
,
Ashwinee Panda
,
Tom Goldstein
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data Elicits LLM Personalization to Real Users
Anikait Singh
,
Sheryl Hsu
,
Kyle Hsu
,
Eric Mitchell
,
Stefano Ermon
,
Tatsunori Hashimoto
,
Archit Sharma
,
Chelsea Finn
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son
,
William Bankes
,
Sangwoong Yoon
,
Shyam Sundhar Ramesh
,
Xiaohang Tang
,
Ilija Bogunovic
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose
,
Zhihan Xiong
,
Yuejie Chi
,
Simon Shaolei Du
,
Lin Xiao
,
Maryam Fazel
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Lily H Zhang
,
Smitha Milli
,
Karen Long Jusko
,
Jonathan Smith
,
Brandon Amos
,
Wassim Bouaziz
,
Jack Kussman
,
Manon Revel
,
Lisa Titus
,
Bhaktipriya Radharapu
,
Jane Yu
,
Vidya Sarma
,
Kristopher Rose
,
Maximilian Nickel
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Oral
Readers:
Everyone
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
Emilio Barkett
,
Olivia Long
,
Madhavendra Thakur
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes
Kasia Kobalczyk
,
Claudio Fanconi
,
Hao Sun
,
Mihaela van der Schaar
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Oral
Readers:
Everyone
EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments
Sara Fish
,
Julia Shephard
,
Minkai Li
,
Ran I Shorrer
,
Yannai A. Gonczarowski
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Language Model Personalization via Reward Factorization
Idan Shenfeld
,
Felix Faltings
,
Pulkit Agrawal
,
Aldo Pacchiano
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
Empirical Studies on the Limitations of Direct Preference Optimization, and a Possible Quick Fix
Jiarui Yao
,
Yong Lin
,
Tong Zhang
Published: 10 Jun 2025, Last Modified: 30 Jun 2025
MoFA Poster
Readers:
Everyone
«
‹
1
2
3
›
»