Toggle navigation
OpenReview
.net
Login
×
×
BibTeX Record
Click anywhere on the box above to highlight complete record
Back to
ICML
ICML 2024 Workshop NextGenAISafety Submissions
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
,
Manolis Zampetakis
,
Paul Kassianik
,
Blaine Nelson
,
Hyrum S Anderson
,
Yaron Singer
,
Amin Karbasi
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Chained Tuning Leads to Biased Forgetting
Megan Ung
,
Alicia Yi Sun
,
Samuel Bell
,
Levent Sagun
,
Adina Williams
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Safer Reinforcement Learning by Going Off-policy: a Benchmark
Igor Kuznetsov
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Generated Audio Detectors are Not Robust in Real-World Conditions
Soumya Shaw
,
Ben Nassi
,
Lea Schönherr
Published: 28 Jun 2024, Last Modified: 26 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Bias Transmission in Large Language Models: Evidence from Gender-Occupation Bias in GPT-4
Kirsten Morehouse
,
Weiwei Pan
,
Juan Manuel Contreras
,
Mahzarin R. Banaji
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Enhancing Concept-based Learning with Logic
Deepika Vemuri
,
Gautham Bellamkonda
,
Vineeth N. Balasubramanian
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Medical Unlearnable Examples: Securing Medical Data from Unauthorized Training via Sparsity-Aware Local Masking
Weixiang Sun
,
Yixin Liu
,
Zhiling Yan
,
Kaidi Xu
,
Lichao Sun
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Can Language Models Safeguard Themselves, Instantly and For Free?
Dyah Adila
,
Changho Shin
,
Yijing Zhang
,
Frederic Sala
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Robustness Analysis of AI Models in Critical Energy Systems
Pantelis Dogoulis
,
matthieu jimenez
,
Maxime Cordy
,
Salah GHAMIZI
,
YVES LE TRAON
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Is My Data Safe? Predicting Instance-Level Membership Inference Success for White-box and Black-box Attacks
Tobias Leemann
,
Bardh Prenkaj
,
Gjergji Kasneci
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations
Zilin Ma
,
Susannah Cheng Su
,
Nathan Zhao
,
Linn Bieske
,
Blake Bullwinkel
,
Yanyi Zhang
,
Jinglun Gao
,
Gekai Liao
,
Siyao Li
,
Ziqing Luo
,
Boxiang Wang
,
Zihan Wen
,
Yanrui Yang
,
Claude Bruderlein
,
Weiwei Pan
Published: 28 Jun 2024, Last Modified: 29 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
On the Calibration of Conditional-Value-at-Risk
Rajeev Verma
,
Volker Fischer
,
Eric Nalisnick
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
Blazej Manczak
,
Eric Lin
,
Eliott Zemour
,
Vaikkunth Mugunthan
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Decomposed evaluations of geographic disparities in text-to-image models
Abhishek Sureddy
,
Dishant Padalia
,
Nandhinee Periyakaruppan
,
Oindrila Saha
,
Adina Williams
,
Adriana Romero-Soriano
,
Megan Richards
,
Polina Kirichenko
,
Melissa Hall
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Uncovering a Culture of AI Grassroots Experimentation by Boston City Employees: Safety Risks and Mitigation
Jude Ha
,
Audrey Xing-Yun Chang
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
,
Sheng Yin
,
Yuxi Wei
,
Chenxin Xu
,
Weibo Mao
,
Felix Juefei-Xu
,
Siheng Chen
,
Yanfeng Wang
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
FairPFN: Transformers Can do Counterfactual Fairness
Jake Robertson
,
Noah Hollmann
,
Noor Awad
,
Frank Hutter
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang
,
Ruiqi Zhong
,
Jiaxin Wen
,
Jacob Steinhardt
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Marginal Fairness Sliced Wasserstein Barycenter
Khai Nguyen
,
Hai Nguyen
,
Nhat Ho
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
On the Robustness of Neural Networks Quantization against Data Poisoning Attacks
Yiwei Lu
,
Yihan Wang
,
Guojun Zhang
,
Yaoliang Yu
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang
,
Shuo Li
,
Edgar Dobriban
,
Osbert Bastani
,
Hamed Hassani
,
Dongsheng Ding
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Models That Prove Their Own Correctness
Noga Amit
,
Shafi Goldwasser
,
Orr Paradise
,
Guy N. Rothblum
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
,
Francesco Croce
,
Nicolas Flammarion
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Exploring Scaling Trends in LLM Robustness
Nikolaus H. R. Howe
,
Michał Zając
,
Ian R. McKenzie
,
Oskar John Hollinsworth
,
Pierre-Luc Bacon
,
Adam Gleave
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva
,
Marina MC Höhne
,
Alexander Warnecke
,
Lukas Pirch
,
Klaus Robert Muller
,
Konrad Rieck
,
Kirill Bykov
Published: 28 Jun 2024, Last Modified: 25 Jul 2024
NextGenAISafety 2024 Poster
Readers:
Everyone
«
‹
1
2
3
4
›
»