OpenReview.net

Login

Back to NeurIPS

NeurIPS 2024 Workshop SafeGenAi Submissions

Loading

About OpenReview
Hosting a Venue
All Venues

Contact
Sponsors
Donate

FAQ
Terms of Use / Privacy Policy
News

About OpenReview
Hosting a Venue
All Venues
Sponsors
News

FAQ
Contact
Donate
Terms of Use
Privacy Policy

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew, Ollie Matthews, Robert McCarthy, Joan Velja, Christian Schroeder de Witt, Dylan Cope, Nandi Schoots
- Published: 12 Oct 2024, Last Modified: 26 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
Saeid Asgari, Aliasghar Khani, Amir Hosein Khasahmadi
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?
Saeid Asgari, Joseph George Lambourne, Alana Mongkhounsavath
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection
Shantanu Thorat, Tianbao Yang
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Detecting Origin Attribution for Text-to-Image Diffusion Models in RGB and Beyond
Katherine Xu, Lingzhi Zhang, Jianbo Shi
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko, Nicolas Flammarion
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Oral
- Readers: Everyone
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
Joshua Kazdan, Hao Sun, Jiaqi Han, Felix Petersen, Frederick Vu, Stefano Ermon
- Published: 12 Oct 2024, Last Modified: 20 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
SolidMark: Evaluating Image Memorization in Generative Models
Nicky Kriplani, Minh Pham, Gowthami Somepalli, Chinmay Hegde, Niv Cohen
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Identifying and Addressing Delusions for Target-Directed Decision Making
Harry Zhao, Tristan Sylvain, Doina Precup, Yoshua Bengio
- Published: 12 Oct 2024, Last Modified: 19 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Has My System Prompt Been Used? Large Language Model Prompt Membership Inference
Roman Levin, Valeriia Cherepanova, Abhimanyu Hans, Avi Schwarzschild, Tom Goldstein
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Improving LLM Group Fairness on Tabular Data via In-Context Learning
Valeriia Cherepanova, Chia-Jung Lee, Nil-Jana Akpinar, Riccardo Fogliato, Martin Andres Bertran, Michael Kearns, James Zou
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Towards a Theory of AI Personhood
Francis Rhys Ward
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Imitation guided Automated Red Teaming
Desik Rengarajan, Sajad Mousavi, Ashwin Ramesh Babu, Vineet Gundecha, Avisek Naug, Sahand Ghorbanpour, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users
Elinor Poole-Dayan, Deb Roy, Jad Kabbara
- Published: 20 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
Sahil Verma, Royi Rassin, Arnav Mohanty Das, Gantavya Bhatt, Preethi Seshadri, Chirag Shah, Jeff Bilmes, Hannaneh Hajishirzi, Yanai Elazar
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries
Adam X. Yang, Chen Chen, Konstantinos Pitas
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
What do we learn from inverting CLIP models?
Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom Goldstein
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Investigation of VLM Failure
Lukas Klein, Kenza Amara, Carsten T. Lüth, Hendrik Strobelt, Mennatallah El-Assady, Paul F Jaeger
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
Self-Supervised Bisimulation Action Chunk Representation for Efficient RL
Lei Shi, Jianye HAO, Hongyao Tang, Zibin Dong, YAN ZHENG
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone
CoS: Enhancing Personalization and Mitigating Bias with Context Steering
Sashrika Pandey, Jerry Zhi-Yang He, Mariah L Schrum, Anca Dragan
- Published: 12 Oct 2024, Last Modified: 14 Nov 2024
- SafeGenAi Poster
- Readers: Everyone

«
‹
1
2
3
4
5
6
7
›
»