Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2025 Workshop Reliable ML Submissions
Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting
Nikoo Naghavian
,
Mostafa Tavassolipour
Published: 29 Sept 2025, Last Modified: 14 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
Antonio Barbalau
,
Cristian Daniel Paduraru
,
Teodor Poncu
,
Alexandru Tifrea
,
Elena Burceanu
Published: 29 Sept 2025, Last Modified: 24 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
Cristian Daniel Paduraru
,
Antonio Barbalau
,
Radu Filipescu
,
Andrei Liviu Nicolicioiu
,
Elena Burceanu
Published: 29 Sept 2025, Last Modified: 24 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Data-Efficient and Robust Coreset Selection via Sparse Adversarial Perturbations
Tushar Shinde
,
Manasa Madabhushi
Published: 29 Sept 2025, Last Modified: 24 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Towards Trustworthy Amortized Bayesian Model Comparison
Šimon Kucharský
,
Aayush Mishra
,
Daniel Habermann
,
Stefan T. Radev
,
Paul-Christian Bürkner
Published: 29 Sept 2025, Last Modified: 16 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Quantifying CBRN Risk in Frontier Models
Divyanshu Kumar
,
Nitin Aravind Birur
,
Tanay Baswa
,
Sahil Agarwal
,
Prashanth Harshangi
Published: 29 Sept 2025, Last Modified: 14 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Influence Functions for Preference Dataset Pruning
Daniel Fein
,
Gabriela Aránguiz Dias
Published: 29 Sept 2025, Last Modified: 29 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
Kimia Hamidieh
,
Veronika Thost
,
Walter Gerych
,
Mikhail Yurochkin
,
Marzyeh Ghassemi
Published: 29 Sept 2025, Last Modified: 29 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Uncertainty-Aware LLMs Fail to Flag Misleading Contexts
Tianyi Zhou
,
Johanne Medina
,
Sanjay Chawla
Published: 29 Sept 2025, Last Modified: 21 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Spectral Regularization as a Safety-Critical Inductive Bias
Shivam Dubey
Published: 29 Sept 2025, Last Modified: 24 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Persistent and Stealthy Backdoor Attacks in Federated Learning via Layerwise Model Poisoning
Nader Bouacida
,
Jayneel Vora
,
Prasant Mohapatra
Published: 29 Sept 2025, Last Modified: 22 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Sandbagging in a Simple Survival Bandit Problem
Joel Dyer
,
Daniel Jarne Ornia
,
Nicholas George Bishop
,
Anisoara Calinescu
,
Michael J. Wooldridge
Published: 29 Sept 2025, Last Modified: 22 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Near-Optimal Reinforcement Learning for Linear Distributionally Robust Markov Decision Processes
Zhishuai Liu
,
Weixin Wang
,
Pan Xu
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding
Marlies Hafer
,
Alexander Marx
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
Wenquan Lu
,
Jiaqi Zhang
,
Hugues Van Assel
,
Randall Balestriero
Published: 29 Sept 2025, Last Modified: 23 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
From Many Voices to One: A Statistically Principled Aggregation of LLM Judges
Jitian Zhao
,
Changho Shin
,
Tzu-Heng Huang
,
Satya Sai Srinath Namburi GNVV
,
Frederic Sala
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Automated Generation of Multilingual Jailbreak Prompts
Jonathan Ding
,
Will Cai
,
Khanak Jain
,
Dhruv Nair
,
Aditya Naha
,
Kevin Zhu
,
Vasu Sharma
Published: 29 Sept 2025, Last Modified: 22 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy
Maxime Heuillet
,
Rishika Bhagwatkar
,
Jonas Ngnawe
,
Yann Pequignot
,
Alexandre Larouche
,
Christian Gagné
,
Irina Rish
,
Ola Ahmad
,
Audrey Durand
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Trust, But Attribute: Tracing Impact of Data on Trustworthiness in Supervised LLM Fine-Tuning
Kumar Shubham
,
Nishant Sharma
,
Karn Tiwari
,
Prathosh AP
Published: 29 Sept 2025, Last Modified: 21 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains
Harshini Suresha
,
Kavitha S H
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Deep Research Brings Deeper Harm
Shuo Chen
,
Zonggen Li
,
Zhen Han
,
Bailan He
,
Tong Liu
,
Haokun Chen
,
Georg Groh
,
Philip Torr
,
Volker Tresp
,
Jindong Gu
Published: 29 Sept 2025, Last Modified: 22 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Yavuz Faruk Bakman
,
Zhiqi Huang
,
Chenyang Zhu
,
Anoop Kumar
,
Alfy Samuel
,
Daben Liu
Published: 29 Sept 2025, Last Modified: 23 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Reliable Compositional Editing with Overlap-Aware Attention in Diffusion Models
Salamata Konate
,
Hassan Hamidi
,
Elham Dolatabadi
,
Frank Rudzicz
,
Laleh Seyyed-Kalantari
Published: 29 Sept 2025, Last Modified: 23 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong
,
Aditi Raghunathan
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang
,
Zeming Wei
,
Qin Liu
,
Wenxuan Zhou
,
Muhao Chen
Published: 29 Sept 2025, Last Modified: 12 Oct 2025
NeurIPS 2025 - Reliable ML Workshop
Readers:
Everyone
«
‹
1
2
3
4
5
6
›
»