Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2024 Workshop SafeGenAi Submissions
Differentially Private Sequential Data Synthesis with Structured State Space Models and Diffusion Models
Tomoya Matsumoto
,
Takayuki Miura
,
Toshiki Shibahara
,
Masanobu Kii
,
Kazuki Iwahana
,
Osamu Saisho
,
Shingo Okamura
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
Andrew Jesson
,
Nicolas Beltran-Velez
,
David Blei
Published: 12 Oct 2024, Last Modified: 26 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Designing Physical-World Universal Attacks on Vision Transformers
Mingzhen Shao
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury
,
Krzysztof Marcin Choromanski
,
Arijit Sehanobish
,
Kumar Avinava Dubey
,
Snigdha Chaturvedi
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
GRE Score: Generative Risk Evaluation for Large Language Models
ZAITANG LI
,
Mohamed MOUHAJIR
,
Pin-Yu Chen
,
Tsung-Yi Ho
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Weak-to-Strong Confidence Prediction
Yukai Yang
,
Tracy Yixin Zhu
,
Marco Morucci
,
Tim G. J. Rudner
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
ZAITANG LI
,
Pin-Yu Chen
,
Tsung-Yi Ho
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Self-Preference Bias in LLM-as-a-Judge
Koki Wataoka
,
Tsubasa Takahashi
,
Ryokan Ri
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
The Empirical Impact of Data Sanitization on Language Models
Anwesan Pal
,
Radhika Bhargava
,
Kyle Hinsz
,
Jacques Esterhuizen
,
Sudipta Bhattacharya
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
Spencer Whitehead
,
Jacob Phillips
,
Sean M. Hendryx
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
,
Zhangchen Xu
,
Luyao Niu
,
Bill Yuchen Lin
,
Radha Poovendran
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System
Julian Collado
,
Kevin Stangl
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Safe and Sound: Evaluating Language Models for Bias Mitigation and Understanding
Shaina Raza
,
Oluwanifemi Bamgbose
,
Shardul Ghuge
,
Deval Pandya
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel F. Brown
,
Basil Labib
,
Codruta Lugoj
,
Sai Sasank Y
Published: 12 Oct 2024, Last Modified: 20 Nov 2024
SafeGenAi Poster
Readers:
Everyone
A Three-Branch Checks-and-Balances Framework for Context-Aware Ethical Alignment of Large Language Models
Edward Y Chang
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompt
Yusu Qian
,
Haotian Zhang
,
Yinfei Yang
,
Zhe Gan
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Towards Resource Efficient and Interpretable Bias Mitigation in Natural Language Generation
Schrasing Tong
,
Eliott Zemour
,
Rawisara Lohanimit
,
Lalana Kagal
Published: 12 Oct 2024, Last Modified: 19 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Concept Unlearning for Large Language Models
Tomoya Yamashita
,
Takayuki Miura
,
Yuuki Yamanaka
,
Toshiki Shibahara
,
Masanori Yamada
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye
,
Yanbo Wang
,
Yue Huang
,
Dongping Chen
,
Qihui Zhang
,
Nuno Moniz
,
Tian Gao
,
Werner Geyer
,
Chao Huang
,
Pin-Yu Chen
,
Nitesh V Chawla
,
Xiangliang Zhang
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo
,
Miao Xiong
,
Christina Heinze-Deml
,
Jaya Narain
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Asa Cooper Stickland
,
Alexander Lyzhov
,
Jacob Pfau
,
Salsabila Mahdi
,
Samuel R. Bowman
Published: 12 Oct 2024, Last Modified: 23 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Safety-Aware Fine-Tuning of Large Language Models
Hyeong Kyu Choi
,
Xuefeng Du
,
Yixuan Li
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li
,
Harkanwar Singh
,
Aditya Grover
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
DiffTextPure: Defending Large Language Models with Diffusion Purifiers
Huanran Chen
,
Ziruo Wang
,
Yihan Yang
,
Shuo Zhang
,
Zeming Wei
,
Fusheng Jin
,
Yinpeng Dong
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Inference, Fast and Slow: Reinterpreting VAEs for OOD Detection
Sicong Huang
,
Jiawei He
,
Kry Yik-Chau Lui
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
«
‹
1
2
3
4
5
6
7
›
»