Abstract: Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's overnight popularity, and are integrated in products used by millions of people every day, such as search engines and productivity suites. Yet the societal impact of LLMs, encompassing both benefits and harms, is not well understood. Inspired by cybersecurity practices, red-teaming is emerging as a technique to uncover model vulnerabilities. Despite increasing attention from industry, academia, and government centered around red-teaming LLMs, such efforts are still limited in the diversity of the red-teaming focus, approaches and participants. Importantly, given that LLMs are becoming ubiquitous, it is imperative that red-teaming efforts are scaled out to include large segments of the research, practitioners and the people whom are directly affected by the deployment of these systems. The goal of this tutorial is two fold. First, we introduce the topic of LLM red-teaming by reviewing the state of the art for red-teaming practices, from participatory events to automatic AI-focused approaches, exposing the gaps in both the techniques and coverage of the targeted harms. Second, we plan to engage the audience in a hands-on and interactive exercise in LLM red-teaming to showcase the ease (or difficulty) of exposing model vulnerabilities, contingent on both the targeted harm and model capabilities. We believe that the KDD community of researchers and practitioners are in a unique position to address the existing gaps in red-teaming approaches, given their longstanding research and practice of extracting knowledge from data.
Loading