Toggle navigation
OpenReview
.net
Login
×
Back to
ICLR
ICLR 2025 Workshop BuildingTrust Submissions
UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS
Alireza Dehghanpour Farashah
,
Aditi Khandelwal
,
Negar Rostamzadeh
,
Golnoosh Farnadi
Published: 05 Mar 2025, Last Modified: 17 Apr 2025
BuildingTrust
Readers:
Everyone
Disentangling Sequence Memorization and General Capability in Large Language Models
Gaurav Rohit Ghosal
,
Pratyush Maini
,
Aditi Raghunathan
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
Unveiling Control Vectors in Language Models with Sparse Autoencoders
ICLR 2025 Workshop BuildingTrust Submission95 Authors
11 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
A Missing Testbed for LLM Pre-Training Membership Inference Attacks
Mingjian Jiang
,
Ken Ziyu Liu
,
Sanmi Koyejo
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered
Ishwara Vasista
,
Imran Mirza
,
Cole Huang
,
Rohan Rajasekhara Patil
,
Aslihan Akalin
,
Kevin Zhu
,
Sean O'Brien
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild
Zhiying Zhu
,
Yiming Yang
,
Zhiqing Sun
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
Veracity: An Online, Open-Source Fact-Checking Solution
ICLR 2025 Workshop BuildingTrust Submission91 Authors
11 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Detecting Unreliable Responses in Generative Vision-Language Models via Visual Uncertainty
ICLR 2025 Workshop BuildingTrust Submission90 Authors
11 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
A Generative Approach to LLM Harmfulness Detection with Red Flag Tokens
Sophie Xhonneux
,
David Dobre
,
Mehrnaz Mofakhami
,
Leo Schwinn
,
Gauthier Gidel
Published: 05 Mar 2025, Last Modified: 25 Apr 2025
BuildingTrust
Readers:
Everyone
AI Companions Are Not The Solution To Loneliness: Design Choices And Their Drawbacks
Jonas B Raedler
,
Siddharth Swaroop
,
Weiwei Pan
Published: 05 Mar 2025, Last Modified: 04 Apr 2025
BuildingTrust
Readers:
Everyone
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
Advik Raj Basani
,
Xiao Zhang
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
Rethinking LLM Bias Probing Using Lessons from the Social Sciences
Kirsten Morehouse
,
Siddharth Swaroop
,
Weiwei Pan
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Rui Xin
,
Niloofar Mireshghallah
,
Shuyue Stella Li
,
Michael Duan
,
Hyunwoo Kim
,
Yejin Choi
,
Yulia Tsvetkov
,
Sewoong Oh
,
Pang Wei Koh
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
DocImpact: Quantifying Document Impact in RAG-LLMs
ICLR 2025 Workshop BuildingTrust Submission84 Authors
11 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Fast Proxies for LLM Robustness Evaluation
Tim Beyer
,
Jan Schuchardt
,
Leo Schwinn
,
Stephan Günnemann
Published: 05 Mar 2025, Last Modified: 07 Apr 2025
BuildingTrust
Readers:
Everyone
BaxBench: Can LLMs Generate Correct and Secure Backends?
Mark Vero
,
Niels Mündler
,
Victor Chibotaru
,
Veselin Raychev
,
Maximilian Baader
,
Nikola Jovanović
,
Jingxuan He
,
Martin Vechev
Published: 05 Mar 2025, Last Modified: 12 Apr 2025
BuildingTrust
Readers:
Everyone
Towards Neural No-Resource Language Translation: A Comparative Evaluation of Approaches
ICLR 2025 Workshop BuildingTrust Submission80 Authors
10 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Opportunities and Challenges of Frontier Data Governance With Synthetic Data
ICLR 2025 Workshop BuildingTrust Submission79 Authors
10 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
Saniya Karwa
,
Navpreet Singh
Published: 05 Mar 2025, Last Modified: 06 Mar 2025
BuildingTrust
Readers:
Everyone
Monitoring LLM Agents for Sequentially Contextual Harm
Chen Yueh-Han
,
Nitish Joshi
,
Yulin Chen
,
He He
,
Rico Angell
Published: 05 Mar 2025, Last Modified: 30 Mar 2025
BuildingTrust
Readers:
Everyone
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
,
Evgenii Kortukov
,
Alexander Panfilov
,
Soroush Tabesh
,
Sebastian Lapuschkin
,
Wojciech Samek
,
Christoph H. Lampert
Published: 05 Mar 2025, Last Modified: 31 Mar 2025
BuildingTrust
Readers:
Everyone
Justified Trust in AI Fairness Assessment using Existing Metadata Entities
Alpay Sabuncuoglu
,
carsten maple
Published: 05 Mar 2025, Last Modified: 31 Mar 2025
BuildingTrust
Readers:
Everyone
Truthfulness in LLMs: A Layer-wise Comparative Analysis of Representation Engineering and Contrast-Consistent Search
ICLR 2025 Workshop BuildingTrust Submission73 Authors
10 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps
ICLR 2025 Workshop BuildingTrust Submission72 Authors
10 Feb 2025 (modified: 06 Mar 2025)
Submitted to BuildingTrust
Readers:
Everyone
Do Multilingual LLMs Think In English?
Lisa Schut
,
Yarin Gal
,
Sebastian Farquhar
Published: 05 Mar 2025, Last Modified: 14 Apr 2025
BuildingTrust
Readers:
Everyone
«
‹
1
2
3
4
5
6
›
»