Prompt Evolution Through Examples for Large Language Models-A Case Study in Game Comment Toxicity Classification

Pittawat Taveekitworachai; Febri Abdullah; Mustafa Can Gursesli; Antonio Lanatà; Andrea Guazzini; Ruck Thawonmas

Prompt Evolution Through Examples for Large Language Models-A Case Study in Game Comment Toxicity Classification

Pittawat Taveekitworachai, Febri Abdullah, Mustafa Can Gursesli, Antonio Lanatà, Andrea Guazzini, Ruck Thawonmas

Published: 01 Jan 2024, Last Modified: 13 May 2025MetroInd4.0 & IoT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a novel approach for automatic prompt optimization (APO) using a large language model (LLM) as an optimizer, named Prompt Evolution Through Examples (PETE). The approach draws inspiration from evolutionary computation for the prompt evolution stages. We aim to aid in developing prompts for use in systems classifying toxic content including game community moderator-assist tools. While traditional approaches are useful for developing these tools, they have various shortcomings where LLMs can potentially mitigates these issues. LLMs accept prompts as inputs to condition generated outputs. However, to design a prompt with the best performance in this task, fine-grained adjustments are usually required and should be automated through the APO process instead of a manual approach, which is often time-consuming. In this study, ChatGPT and GPT-4 are utilized as both task performers and prompt optimizers for comparisons across models. The results indicate that PETE improves the performance of the target task up to 56.14% from a performance of an initial prompt, compared to only up to 49.15% using a standard mutation evolution. Optimized prompts are provided for future utilization in other game community moderation tools. We also recommend that future studies explore more cost-effective approaches for evaluation using LLMs to enhance the benefits of APO.

Loading