Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2024 Workshop SafeGenAi Submissions
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew
,
Ollie Matthews
,
Robert McCarthy
,
Joan Velja
,
Christian Schroeder de Witt
,
Dylan Cope
,
Nandi Schoots
Published: 12 Oct 2024, Last Modified: 26 Nov 2024
SafeGenAi Poster
Readers:
Everyone
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
Saeid Asgari
,
Aliasghar Khani
,
Amir Hosein Khasahmadi
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?
Saeid Asgari
,
Joseph George Lambourne
,
Alana Mongkhounsavath
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection
Shantanu Thorat
,
Tianbao Yang
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Detecting Origin Attribution for Text-to-Image Diffusion Models in RGB and Beyond
Katherine Xu
,
Lingzhi Zhang
,
Jianbo Shi
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
,
Nicolas Flammarion
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Oral
Readers:
Everyone
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
Joshua Kazdan
,
Hao Sun
,
Jiaqi Han
,
Felix Petersen
,
Frederick Vu
,
Stefano Ermon
Published: 12 Oct 2024, Last Modified: 20 Nov 2024
SafeGenAi Poster
Readers:
Everyone
SolidMark: Evaluating Image Memorization in Generative Models
Nicky Kriplani
,
Minh Pham
,
Gowthami Somepalli
,
Chinmay Hegde
,
Niv Cohen
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Identifying and Addressing Delusions for Target-Directed Decision Making
Harry Zhao
,
Tristan Sylvain
,
Doina Precup
,
Yoshua Bengio
Published: 12 Oct 2024, Last Modified: 19 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference
Roman Levin
,
Valeriia Cherepanova
,
Abhimanyu Hans
,
Avi Schwarzschild
,
Tom Goldstein
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Improving LLM Group Fairness on Tabular Data via In-Context Learning
Valeriia Cherepanova
,
Chia-Jung Lee
,
Nil-Jana Akpinar
,
Riccardo Fogliato
,
Martin Andres Bertran
,
Michael Kearns
,
James Zou
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Towards a Theory of AI Personhood
Francis Rhys Ward
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica
,
Yixin Liu
,
Arman Cohan
,
Tim G. J. Rudner
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Imitation guided Automated Red Teaming
Desik Rengarajan
,
Sajad Mousavi
,
Ashwin Ramesh Babu
,
Vineet Gundecha
,
Avisek Naug
,
Sahand Ghorbanpour
,
Antonio Guillen
,
Ricardo Luna Gutierrez
,
Soumyendu Sarkar
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users
Elinor Poole-Dayan
,
Deb Roy
,
Jad Kabbara
Published: 20 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
Sahil Verma
,
Royi Rassin
,
Arnav Mohanty Das
,
Gantavya Bhatt
,
Preethi Seshadri
,
Chirag Shah
,
Jeff Bilmes
,
Hannaneh Hajishirzi
,
Yanai Elazar
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries
Adam X. Yang
,
Chen Chen
,
Konstantinos Pitas
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
What do we learn from inverting CLIP models?
Hamid Kazemi
,
Atoosa Chegini
,
Jonas Geiping
,
Soheil Feizi
,
Tom Goldstein
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Investigation of VLM Failure
Lukas Klein
,
Kenza Amara
,
Carsten T. Lüth
,
Hendrik Strobelt
,
Mennatallah El-Assady
,
Paul F Jaeger
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
Self-Supervised Bisimulation Action Chunk Representation for Efficient RL
Lei Shi
,
Jianye HAO
,
Hongyao Tang
,
Zibin Dong
,
YAN ZHENG
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
CoS: Enhancing Personalization and Mitigating Bias with Context Steering
Sashrika Pandey
,
Jerry Zhi-Yang He
,
Mariah L Schrum
,
Anca Dragan
Published: 12 Oct 2024, Last Modified: 14 Nov 2024
SafeGenAi Poster
Readers:
Everyone
«
‹
1
2
3
4
5
6
7
›
»