Enhancing the Reliability of LLMs-based Systems for Survey Generation through Distributional Drift Detection

Published: 29 Jun 2024, Last Modified: 08 Jul 2024KiL 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Survey Generation, Reliability, Distribution Drifts
TL;DR: In this study, we proposed a comprehensive evaluation framework to enhance the reliability of Large Language Model (LLM)-based systems for survey generation tasks
Abstract: Evaluating Large Language Model (LLM)-based systems is a recurrent challenge in modern machine learning research and development. It is crucial to ensure that any changes made in the production environments will not negatively impact user experience, and clever evaluation techniques are especially important when updated models or prompts create disparities within the system. Since we released the feature to help our customers create surveys with textual prompts in 2023, we have iteratively improved several parts of the system such as the prompts, the LLM models and the system's internal logic. To measure the impact of these changes, we propose a comprehensive framework for assessing surveys generated by LLMs, focusing on data drift analyses based on survey metadata features. By leveraging this approach, we can effectively identify and address potential areas of concern related to model performance, enhancing the reliability and usability of LLM-based systems for survey generation tasks.
Submission Number: 4
Loading