Keywords: AI safety, large language model, LLM agent, backdoor, jailbreak
TL;DR: We propose a competition on the safety of LLMs and LLM-powered agents, with three tracks on jailbreaking attacks and trigger recovery for backdoored LLMs and LLM agents.
Abstract: Ensuring safety emerges as a pivotal objective in developing large language models
(LLMs) and LLM-powered agents. The Competition for LLM and Agent Safety
(CLAS) aims to advance the understanding of the vulnerabilities in LLMs and
LLM-powered agents and to encourage methods for improving their safety. The
competition features three main tracks linked through the methodology of prompt
injection, with tasks designed to amplify societal impact by involving practical
adversarial objectives for different domains. In the Jailbreaking Attack track,
participants are challenged to elicit harmful outputs in guardrail LLMs via prompt
injection. In the Backdoor Trigger Recovery for Models track, participants are
given a CodeGen LLM embedded with hundreds of domain-specific backdoors.
They are asked to reverse-engineer the trigger for each given target. In the Back-
door Trigger Recovery for Agents track, trigger reverse engineering will be
focused on eliciting specific backdoor targets based on malicious agent actions. As
the first competition addressing the safety of both LLMs and LLM agents, CLAS
2024 aims to foster collaboration between various communities promoting research
and tools for enhancing the safety of LLMs and real-world AI systems.
Competition Timeline: Timeline:
* Jun 18: The competition website goes live.
* July 3: Registration starts.
* July 15: The development phase begins. Development models and data are released.
* October 12: Final submissions for the development phase.
* October 13: The test phase begins. Test phase models and data are released.
* October 18: Final submissions for the test phase.
* October 21: Top-ranking teams are contacted and asked for their code, models, and method details.
* October 30: Winning teams are announced for all tracks.
List of authors and affiliations as a string:
Zhen Xiang (UIUC) Yi Zeng (VT) Mintong Kang (UIUC) Chejian Xu (UIUC) Jiawei Zhang (UIUC) Zhuowen Yuan (UIUC) Zhaorun Chen (UChicago) Chulin Xie (UIUC) Fengqing Jiang (UW) Minzhou Pan (NEU) Junyuan Hong (UT Austin) Ruoxi Jia (VT) Radha Poovendran (UW) Bo Li (UChicago & UIUC)
Website: https://www.llmagentsafetycomp24.com/
Primary Contact Email: clas2024-organizers@googlegroups.com
Participant Contact Email: clas2024-updates@googlegroups.com
Workshop Format: Hybrid (Vancouver + some online speakers)
Preferred Timezone: Central Time (CT)
Logo Image: png
Submission Number: 23
Loading