Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2024 Workshop Red Teaming GenAI Submissions
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Alex Beutel
,
Kai Yuanqing Xiao
,
Johannes Heidecke
,
Lilian Weng
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Algorithmic Oversight for Deceptive Reasoning
Ege Onur Taga
,
Mingchen Li
,
Yongqi Chen
,
Samet Oymak
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Steganography in Large Language Models: Investigating Emergence and Mitigations
Yohan Mathew
,
Robert McCarthy
,
Ollie Matthews
,
Joan Velja
,
Nandi Schoots
,
Dylan Cope
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
Md Rafi Ur Rashid
,
Jing Liu
,
Toshiaki Koike-Akino
,
Shagufta Mehnaz
,
Ye Wang
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
iART - Imitation guided Automated Red Teaming
Sajad Mousavi
,
Desik Rengarajan
,
Ashwin Ramesh Babu
,
Vineet Gundecha
,
Avisek Naug
,
Sahand Ghorbanpour
,
Ricardo Luna Gutierrez
,
Antonio Guillen
,
Paolo Faraboschi
,
Soumyendu Sarkar
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Oral
Readers:
Everyone
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
Haneul Yoo
,
Yongjin Yang
,
Hwaran Lee
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
Tong Chen
,
Akari Asai
,
Niloofar Mireshghallah
,
Sewon Min
,
James Grimmelmann
,
Yejin Choi
,
Hannaneh Hajishirzi
,
Luke Zettlemoyer
,
Pang Wei Koh
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Seanie Lee
,
Minsu Kim
,
Lynn Cherif
,
David Dobre
,
Juho Lee
,
Sung Ju Hwang
,
Kenji Kawaguchi
,
Gauthier Gidel
,
Yoshua Bengio
,
Nikolay Malkin
,
Moksh Jain
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar
,
Divyanshu Kumar
,
Jatan Loya
,
Nitin Aravind Birur
,
Tanay Baswa
,
Sahil Agarwal
,
Prashanth Harshangi
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Interactive Semantic Interventions for VLMs: Breaking VLMs with Human Ingenuity
Lukas Klein
,
Kenza Amara
,
Carsten T. Lüth
,
Hendrik Strobelt
,
Mennatallah El-Assady
,
Paul F Jaeger
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
Large Language Model Detoxification: Data and Metric Solutions
SungJoo Byun
,
Hyopil Shin
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
,
Ziwen Han
,
Ian Steneker
,
Willow E. Primack
,
Riley Goodside
,
Hugh Zhang
,
Zifan Wang
,
Cristina Menghini
,
Summer Yue
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Oral
Readers:
Everyone
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
Hongfu Liu
,
Yuxi Xie
,
Ye Wang
,
Michael Shieh
Published: 09 Oct 2024, Last Modified: 03 Jan 2025
Red Teaming GenAI Workshop @ NeurIPS'24 Poster
Readers:
Everyone
«
‹
1
2
›
»