Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron Jiaxun Li
,
Satyapriya Krishna
,
Himabindu Lakkaraju
Published: 01 Jan 2025, Last Modified: 16 May 2025
ICLR 2025
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading