NeurIPS 2025 Workshop Reliable ML Submissions

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting
Nikoo Naghavian, Mostafa Tavassolipour
- Published: 29 Sept 2025, Last Modified: 14 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Antonio Barbalau, Cristian Daniel Paduraru, Teodor Poncu, Alexandru Tifrea, Elena Burceanu
- Published: 29 Sept 2025, Last Modified: 24 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu
- Published: 29 Sept 2025, Last Modified: 24 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Data-Efficient and Robust Coreset Selection via Sparse Adversarial Perturbations
Tushar Shinde, Manasa Madabhushi
- Published: 29 Sept 2025, Last Modified: 24 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Towards Trustworthy Amortized Bayesian Model Comparison
Šimon Kucharský, Aayush Mishra, Daniel Habermann, Stefan T. Radev, Paul-Christian Bürkner
- Published: 29 Sept 2025, Last Modified: 16 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Quantifying CBRN Risk in Frontier Models
Divyanshu Kumar, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
- Published: 29 Sept 2025, Last Modified: 14 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Influence Functions for Preference Dataset Pruning
Daniel Fein, Gabriela Aránguiz Dias
- Published: 29 Sept 2025, Last Modified: 29 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
Kimia Hamidieh, Veronika Thost, Walter Gerych, Mikhail Yurochkin, Marzyeh Ghassemi
- Published: 29 Sept 2025, Last Modified: 29 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Uncertainty-Aware LLMs Fail to Flag Misleading Contexts
Tianyi Zhou, Johanne Medina, Sanjay Chawla
- Published: 29 Sept 2025, Last Modified: 21 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Spectral Regularization as a Safety-Critical Inductive Bias
Shivam Dubey
- Published: 29 Sept 2025, Last Modified: 24 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Persistent and Stealthy Backdoor Attacks in Federated Learning via Layerwise Model Poisoning
Nader Bouacida, Jayneel Vora, Prasant Mohapatra
- Published: 29 Sept 2025, Last Modified: 22 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Sandbagging in a Simple Survival Bandit Problem
Joel Dyer, Daniel Jarne Ornia, Nicholas George Bishop, Anisoara Calinescu, Michael J. Wooldridge
- Published: 29 Sept 2025, Last Modified: 22 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Near-Optimal Reinforcement Learning for Linear Distributionally Robust Markov Decision Processes
Zhishuai Liu, Weixin Wang, Pan Xu
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding
Marlies Hafer, Alexander Marx
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
Wenquan Lu, Jiaqi Zhang, Hugues Van Assel, Randall Balestriero
- Published: 29 Sept 2025, Last Modified: 23 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
From Many Voices to One: A Statistically Principled Aggregation of LLM Judges
Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Automated Generation of Multilingual Jailbreak Prompts
Jonathan Ding, Will Cai, Khanak Jain, Dhruv Nair, Aditya Naha, Kevin Zhu, Vasu Sharma
- Published: 29 Sept 2025, Last Modified: 22 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy
Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnawe, Yann Pequignot, Alexandre Larouche, Christian Gagné, Irina Rish, Ola Ahmad, Audrey Durand
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Trust, But Attribute: Tracing Impact of Data on Trustworthiness in Supervised LLM Fine-Tuning
Kumar Shubham, Nishant Sharma, Karn Tiwari, Prathosh AP
- Published: 29 Sept 2025, Last Modified: 21 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains
Harshini Suresha, Kavitha S H
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
- Published: 29 Sept 2025, Last Modified: 22 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Yavuz Faruk Bakman, Zhiqi Huang, Chenyang Zhu, Anoop Kumar, Alfy Samuel, Daben Liu
- Published: 29 Sept 2025, Last Modified: 23 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Reliable Compositional Editing with Overlap-Aware Attention in Diffusion Models
Salamata Konate, Hassan Hamidi, Elham Dolatabadi, Frank Rudzicz, Laleh Seyyed-Kalantari
- Published: 29 Sept 2025, Last Modified: 23 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong, Aditi Raghunathan
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang, Zeming Wei, Qin Liu, Wenxuan Zhou, Muhao Chen
- Published: 29 Sept 2025, Last Modified: 12 Oct 2025
- NeurIPS 2025 - Reliable ML Workshop
- Readers: Everyone