Toggle navigation
OpenReview
.net
Login
×
Back to
ICLR
ICLR 2025 Workshop BuildingTrust Submissions
Unnatural Languages Are Not Bugs but Features for LLMs
Keyu Duan
,
Yiran Zhao
,
Zhili Feng
,
Jinjie Ni
,
Tianyu Pang
,
Qian Liu
,
Tianle Cai
,
Longxu Dou
,
Kenji Kawaguchi
,
Anirudh Goyal
,
J Zico Kolter
,
Michael Qizhe Shieh
Published: 05 Mar 2025, Last Modified: 15 Apr 2025
BuildingTrust
Readers:
Everyone
LLMS LOST IN TRANSLATION: M-ALERT UNCOVERS CROSS-LINGUISTIC SAFETY GAPS
Felix Friedrich
,
Simone Tedeschi
,
Patrick Schramowski
,
Manuel Brack
,
Roberto Navigli
,
Huu Nguyen
,
Bo Li
,
Kristian Kersting
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
Is this a real image?
ICLR 2025 Workshop BuildingTrust Submission40 Authors
08 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
ToolScan: A Benchmark For Characterizing Errors In Tool-Use LLMs
Shirley Kokane
,
Ming Zhu
,
Tulika Manoj Awalgaonkar
,
Jianguo Zhang
,
Akshara Prabhakar
,
Thai Quoc Hoang
,
Zuxin Liu
,
Rithesh R N
,
Liangwei Yang
,
Weiran Yao
,
Juntao Tan
,
Zhiwei Liu
,
Huan Wang
,
Juan Carlos Niebles
,
Shelby Heinecke
,
Caiming Xiong
,
Silvio Savarese
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Gouki Minegishi
,
Hiroki Furuta
,
Shohei Taniguchi
,
Yusuke Iwasawa
,
Yutaka Matsuo
Published: 05 Mar 2025, Last Modified: 24 Mar 2025
BuildingTrust
Readers:
Everyone
Budget-Constrained Learning to Defer for Autoregressive Models
ICLR 2025 Workshop BuildingTrust Submission37 Authors
07 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Building Bridges, Not Walls: Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
,
Tessa Han
,
Usha Bhalla
,
Himabindu Lakkaraju
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
Order Independence With Finetuning
ICLR 2025 Workshop BuildingTrust Submission35 Authors
07 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs
ICLR 2025 Workshop BuildingTrust Submission34 Authors
07 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Private Retrieval Augmented Generation with Random Projection
Dixi Yao
,
Tian Li
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova
,
Erik Brinkman
,
Krithika Iyer
,
Vítor Albiero
,
Joanna Bitton
,
Hailey Nguyen
,
Cristian Canton Ferrer
,
Ivan Evtimov
,
Aaron Grattafiori
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality
Hanbo Huang
,
Yihan Li
,
Bowen Jiang
,
Lin Liu
,
Bo Jiang
,
Ruoyu Sun
,
Zhuotao Liu
,
Shiyu Liang
Published: 05 Mar 2025, Last Modified: 24 Mar 2025
BuildingTrust
Readers:
Everyone
Privately Learning from Graphs with Applications in Fine-tuning Large Pretrained Models
Haoteng Yin
,
Rongzhe Wei
,
Eli Chien
,
Pan Li
Published: 05 Mar 2025, Last Modified: 15 Apr 2025
BuildingTrust
Readers:
Everyone
Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights
ICLR 2025 Workshop BuildingTrust Submission29 Authors
06 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning
ICLR 2025 Workshop BuildingTrust Submission28 Authors
06 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Evaluating AI Safety in Polish: An Automated Red-Teaming Approach
ICLR 2025 Workshop BuildingTrust Submission27 Authors
06 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xu Wang
,
Yan Hu
,
Wenyu Du
,
Reynold Cheng
,
Benyou Wang
,
Difan Zou
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
,
Hongzhan Lin
,
Ziyang Luo
,
Zhen Ye
,
Guang Chen
,
Jing Ma
Published: 05 Mar 2025, Last Modified: 30 Mar 2025
BuildingTrust
Readers:
Everyone
SPEX: Scaling Feature Interaction Explanations for LLMs
Justin Singh Kang
,
Landon Butler
,
Abhineet Agarwal
,
Yigit Efe Erginbas
,
Ramtin Pedarsani
,
Bin Yu
,
Kannan Ramchandran
Published: 05 Mar 2025, Last Modified: 02 Apr 2025
BuildingTrust
Readers:
Everyone
Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction
Harit Vishwakarma
,
Thomas Cook
,
Alan Mishler
,
Niccolo Dalmasso
,
Natraj Raman
,
Sumitra Ganesh
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
Scalable Fingerprinting of Large Language Models
Anshul Nasery
,
Jonathan Hayase
,
Creston Brooks
,
Peiyao Sheng
,
Himanshu Tyagi
,
Pramod Viswanath
,
Sewoong Oh
Published: 05 Mar 2025, Last Modified: 12 Apr 2025
BuildingTrust
Readers:
Everyone
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
Yuetai Li
,
Zhangchen Xu
,
Fengqing Jiang
,
Luyao Niu
,
Dinuka Sahabandu
,
Bhaskar Ramasubramanian
,
Radha Poovendran
Published: 05 Mar 2025, Last Modified: 05 Apr 2025
BuildingTrust
Readers:
Everyone
What is the chance of being so unfair?
ICLR 2025 Workshop BuildingTrust Submission19 Authors
05 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Towards Effective Discrimination Testing for Generative AI
Thomas P Zollo
,
Nikita Rajaneesh
,
Richard Zemel
,
Talia B. Gillis
,
Emily Black
Published: 05 Mar 2025, Last Modified: 24 Mar 2025
BuildingTrust
Readers:
Everyone
The Differences Between Direct Alignment Algorithms are a Blur
Alexey Gorbatovski
,
Boris Shaposhnikov
,
Viacheslav Sinii
,
Alexey Malakhov
,
Daniil Gavrilov
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
«
‹
1
2
3
4
5
6
›
»